Training & test sets and finetuned models
AI & ML interests
Workflow of Reinforcement Learning from Human Feedback (RLHF). Blog: https://rlhflow.github.io/
Recent Activity
models
37
RLHFlow/Qwen2.5-Math-1.5B-DAPO-easy
2B
•
Updated
•
37
RLHFlow/Qwen2.5-Math-1.5B-GRPO-n8-easy
2B
•
Updated
•
25
RLHFlow/Qwen2.5-Math-1-5B-Reinforce-Ada-balance-hard
Updated
•
9
RLHFlow/Qwen2.5-Math-1-5B-Reinforce-Ada-balance-easy
2B
•
Updated
•
16
RLHFlow/Qwen2.5-Math-7B-Reinforce-Ada-balance-easy
8B
•
Updated
•
15
RLHFlow/Qwen2.5-Math-7B-Reinforce-Ada-balance-hard
8B
•
Updated
•
13
RLHFlow/Qwen3-4B-Instruct-2507-Reinforce-Ada-balance-hard
4B
•
Updated
•
14
RLHFlow/Llama-3.2-3B-Instruct-Reinforce-Ada-balance-hard
4B
•
Updated
•
10
RLHFlow/Qwen2.5-Math-7B-Zero-RAFTpp
Text Generation
•
8B
•
Updated
•
1
RLHFlow/Qwen2.5-Math-7B-Zero-Reinforce-Rej
Text Generation
•
8B
•
Updated
•
3
•
1
datasets
88
RLHFlow/reinforce_ada_hard_prompt_1-5b
Viewer
•
Updated
•
13.3k
•
26
RLHFlow/reinforce_ada_simple_prompt_1-5b
Viewer
•
Updated
•
25k
•
42
RLHFlow/reinforce_ada_hard_prompt_llama
Viewer
•
Updated
•
15k
•
25
RLHFlow/reinforce_ada_easy_prompt
Viewer
•
Updated
•
24.3k
•
30
RLHFlow/reinforce_ada_hard_prompt
Viewer
•
Updated
•
15.7k
•
111
RLHFlow/self_rewarding_turn2_example
Updated
•
6
RLHFlow/self_rewarding_turn1_with_rewards_example
Updated
•
9
RLHFlow/self_rewarding_rl_prompt
Updated
•
13
RLHFlow/self_rewarding_sft_prompt
Viewer
•
Updated
•
40k
•
7
RLHFlow/self_rewarding_ift_example_raw_data1
Viewer
•
Updated
•
16.3k
•
6