RLHFlow

university

RLHFlow

Activity Feed

AI & ML interests

Workflow of Reinforcement Learning from Human Feedback (RLHF). Blog: https://rlhflow.github.io/

Recent Activity

Chenlu123 submitted a paper 24 days ago

Adaptive Layerwise Perturbation: Unifying Off-Policy Corrections for LLM RL

baohao submitted a paper 2 months ago

Self-Hinting Language Models Enhance Reinforcement Learning

baohao updated a collection 6 months ago

Reinforce-Ada

View all activity

Papers

Reinforce-Ada: An Adaptive Sampling Framework for Reinforce-Style LLM Training

View all Papers

Collections 12

View 12 collections

models 37

datasets 88

RLHFlow/reinforce_ada_hard_prompt_1-5b

Viewer • Updated Oct 16, 2025 • 13.3k • 23

RLHFlow/reinforce_ada_simple_prompt_1-5b

Viewer • Updated Oct 16, 2025 • 25k • 14

RLHFlow/reinforce_ada_hard_prompt_llama

Viewer • Updated Oct 10, 2025 • 15k • 20

RLHFlow/reinforce_ada_easy_prompt

Viewer • Updated Oct 10, 2025 • 24.3k • 14

RLHFlow/reinforce_ada_hard_prompt

Viewer • Updated Oct 10, 2025 • 15.7k • 24 • 2

RLHFlow/self_rewarding_turn2_example

Updated Mar 2, 2025 • 3

RLHFlow/self_rewarding_turn1_with_rewards_example

Updated Mar 2, 2025 • 11

RLHFlow/self_rewarding_rl_prompt

Updated Mar 2, 2025 • 11

RLHFlow/self_rewarding_sft_prompt

Viewer • Updated Mar 2, 2025 • 40k • 11

RLHFlow/self_rewarding_ift_example_raw_data1

Viewer • Updated Feb 26, 2025 • 16.3k • 3

View 88 datasets

RLHFlow

AI & ML interests

Recent Activity

Papers

Collections 12

Reinforce-Ada: An Adaptive Sampling Framework for Reinforce-Style LLM Training

weqweasdas/math500

weqweasdas/aime_hmmt_brumo_cmimc_amc23

weqweasdas/olympiadbench

RLHFlow/Qwen2.5-Math-7B-Zero-RAFTpp

RLHFlow/Qwen2.5-Math-7B-Zero-Reinforce-Rej

Reinforce-Ada: An Adaptive Sampling Framework for Reinforce-Style LLM Training

weqweasdas/math500

weqweasdas/aime_hmmt_brumo_cmimc_amc23

weqweasdas/olympiadbench

RLHFlow/Qwen2.5-Math-7B-Zero-RAFTpp

RLHFlow/Qwen2.5-Math-7B-Zero-Reinforce-Rej

models 37

RLHFlow/Qwen2.5-Math-1.5B-DAPO-easy

RLHFlow/Qwen2.5-Math-1.5B-GRPO-n8-easy

RLHFlow/Qwen2.5-Math-1-5B-Reinforce-Ada-balance-hard

RLHFlow/Qwen2.5-Math-1-5B-Reinforce-Ada-balance-easy

RLHFlow/Qwen2.5-Math-7B-Reinforce-Ada-balance-easy

RLHFlow/Qwen2.5-Math-7B-Reinforce-Ada-balance-hard

RLHFlow/Qwen3-4B-Instruct-2507-Reinforce-Ada-balance-hard

RLHFlow/Llama-3.2-3B-Instruct-Reinforce-Ada-balance-hard

RLHFlow/Qwen2.5-Math-7B-Zero-RAFTpp

RLHFlow/Qwen2.5-Math-7B-Zero-Reinforce-Rej

datasets 88

RLHFlow/reinforce_ada_hard_prompt_1-5b

RLHFlow/reinforce_ada_simple_prompt_1-5b

RLHFlow/reinforce_ada_hard_prompt_llama

RLHFlow/reinforce_ada_easy_prompt

RLHFlow/reinforce_ada_hard_prompt

RLHFlow/self_rewarding_turn2_example

RLHFlow/self_rewarding_turn1_with_rewards_example

RLHFlow/self_rewarding_rl_prompt

RLHFlow/self_rewarding_sft_prompt

RLHFlow/self_rewarding_ift_example_raw_data1

AI & ML interests

Recent Activity

Papers

Team members 9

Collections 12

models 37 Sort: Recently updated

datasets 88 Sort: Recently updated

models 37

datasets 88