Curriculum Reinforcement Learning from Easy to Hard Tasks Improves LLM Reasoning Paper • 2506.06632 • Published Mar 16
Harder Is Better: Boosting Mathematical Reasoning via Difficulty-Aware GRPO and Multi-Aspect Question Reformulation Paper • 2601.20614 • Published Jan 28 • 120 • 20
Harder Is Better: Boosting Mathematical Reasoning via Difficulty-Aware GRPO and Multi-Aspect Question Reformulation Paper • 2601.20614 • Published Jan 28 • 120 • 20
shubhamprshr/Qwen2.5-1.5B-Instruct_gsm8k_grpo_gaussian_0.5_0.5_SEC0.3DRO1.0G0.0_minpTrue_1600 Text Generation • 2B • Updated Nov 20, 2025 • 9
shubhamprshr/Qwen2.5-1.5B-Instruct_gsm8k_grpo_gaussian_0.25_0.75_SEC0.3DRO1.0G0.0_minpTrue_1600 Text Generation • 2B • Updated Nov 20, 2025 • 3
shubhamprshr/Qwen2.5-1.5B-Instruct_gsm8k_grpo_gaussian_0.5_0.5_SEC0.3DRO1.0G0.0_minpTrue_1600 Text Generation • 2B • Updated Nov 20, 2025 • 9
shubhamprshr/Qwen2.5-1.5B-Instruct_gsm8k_grpo_gaussian_0.25_0.75_SEC0.3DRO1.0G0.0_minpTrue_1600 Text Generation • 2B • Updated Nov 20, 2025 • 3
shubhamprshr/Qwen2.5-1.5B-Instruct_countdown2345_grpo_cosine_0.5_0.5_SEC0.3DRO1.0G0.0_minpTrue_1600 Text Generation • 2B • Updated Nov 18, 2025 • 5
shubhamprshr/Qwen2.5-1.5B-Instruct_countdown2345_grpo_gaussian_0.5_0.5_SEC0.3DRO1.0G0.0_minpTrue_1600 Text Generation • 2B • Updated Nov 18, 2025 • 8
shubhamprshr/Qwen2.5-1.5B-Instruct_math_grpo_gaussian_0.5_0.5_SEC0.3DRO1.0G0.0_minpTrue_1600 Text Generation • 2B • Updated Nov 18, 2025 • 3
shubhamprshr/Qwen2.5-1.5B-Instruct_math_grpo_cosine_0.5_0.5_SEC0.3DRO1.0G0.0_minpTrue_1600 Text Generation • 2B • Updated Nov 18, 2025 • 5
shubhamprshr/Qwen2.5-1.5B-Instruct_countdown2345_grpo_cosine_0.5_0.5_SEC0.3DRO1.0G0.0_minpTrue_1600 Text Generation • 2B • Updated Nov 18, 2025 • 5