pristinawang/ppo-smalldata-flan-t5-ppo-finetuned Reinforcement Learning • 0.2B • Updated Dec 12, 2024 • 4
tzwilliam0/maxmin-dpo-init-kl-coef-0.1-fix-reward-norm-dongnan Reinforcement Learning • Updated Jan 10 • 3
tzwilliam0/maxmin-dpo-init-kl-coef-0.5-fix-reward-norm-dongnan Reinforcement Learning • Updated Jan 10 • 3