Jimmy19991222/llama-3-8b-instruct-gapo-v2-bleu-beta10-gamma0.3-lr1.0e-6-he_scale-rerun Text Generation • 8B • Updated Sep 9, 2024 • 5
Jimmy19991222/llama-3-8b-instruct-gapo-v2-jaccard_score-beta10-gamma0.3-lr1.0e-6-he_scale-rerun Text Generation • 8B • Updated Sep 9, 2024 • 5
Jimmy19991222/llama-3-8b-instruct-gapo-v2-rouge1-beta10-gamma0.3-lr1.0e-6-he_scale-rerun Text Generation • 8B • Updated Sep 9, 2024 • 7
Jimmy19991222/llama-3-8b-instruct-gapo-v2-rouge2-beta10-gamma0.3-lr1.0e-6-he_scale-rerun Text Generation • 8B • Updated Sep 9, 2024 • 6
CharlesLi/OpenELM-1_1B-DPO-full-max-reward-least-similar Text Generation • 1B • Updated Oct 3, 2024 • 5
CharlesLi/OpenELM-1_1B-DPO-full-max-reward-most-similar Text Generation • 1B • Updated Oct 3, 2024 • 3