hdong0/deepseek-Qwen2.5-Math-1.5B-Open-R1-GRPO_100steps_lr1e-6_acc Text Generation • Updated 12 days ago • 15
hdong0/deepseek-Qwen2.5-Math-1.5B-Open-R1-GRPO_deepscaler_1000steps_lr1e-6_acc Text Generation • Updated 8 days ago • 50