Reinforce-Ada: An Adaptive Sampling Framework for Reinforce-Style LLM Training
Paper
•
2510.04996
•
Published
•
15
Training & test sets and finetuned models
Note Prompt set used for data processing
Note Selected hard prompts used to train Qwen2.5-Math-7B and Qwen3-4B-Instruct-2507
Note Selected easy prompts used to train Qwen2.5-Math-7B
Note Selected hard prompts used to train Llama-3.2-3B-Instruct
Note Checkpoint from step=400 and trained on the hard prompt set
Note Checkpoint from step=400 and trained on the hard prompt set.
Note Checkpoint from step=400 and trained on the hard prompt set.
Note Checkpoint from step=500 and trained on the easy prompt set.
Note Selected easy prompts used to train Qwen2.5-Math-1.5B
Note Selected hard prompts used to train Qwen2.5-Math-1.5B