Post
2053
I've been working on something cool: a GRPO with an LLM evaluator that can also perform SFT on the feedback data - if you want. Check it out ๐
Any ๐are more than welcome ๐ค
https://github.com/mkurman/grpo-llm-evaluator
Any ๐are more than welcome ๐ค
https://github.com/mkurman/grpo-llm-evaluator