kz919/DeepSeek-R1-Distill-Qwen-1.5B-GRPO-Cautious-TRL-0.18.0.dev Text Generation • 2B • Updated Jun 9 • 3 • 1
hdong0/Qwen2.5-Math-1.5B-Open-R1-GRPO_openr1_100steps_lr1e-6_acc Text Generation • 2B • Updated May 28 • 6