have you compared your GRPO model with a fine-tuned model?
On deepseek paper, they say that finetuning+GRPO is much more effective for warming up the models.
Also, I think evaluation between GRPO vs Fine-tuning could be meaningful.
Taeyup Kim
terrykim0404
AI & ML interests
yes
Recent Activity
commented on
an
article
about 2 months ago
I trained a Language Model to schedule events with GRPO!
upvoted
an
article
about 2 months ago
I trained a Language Model to schedule events with GRPO!
Organizations
None yet