RegularizedSelfPlay/sppo_reversekl-0.1-Gemma-2-2B-IT-RSPO-Iter3 Text Generation • 3B • Updated Aug 11 • 5
RegularizedSelfPlay/sppo_reversekl-0.1-Gemma-2-2B-IT-RSPO-Iter2 Text Generation • 3B • Updated Aug 11 • 6
RegularizedSelfPlay/sppo_reversekl-0.1-Gemma-2-2B-IT-RSPO-Iter1 Text Generation • 3B • Updated Aug 11 • 5
RegularizedSelfPlay/Llama-3-8B-Instruct-SPPO-Iter2-gp-8b-gpm-reg0.5-sppo-reversekl-table Text Generation • 8B • Updated Jul 30 • 5
RegularizedSelfPlay/Llama-3-8B-Instruct-SPPO-Iter2-gp-8b-gpm-reg0.05-sppo-reversekl-table Text Generation • 8B • Updated Jul 30 • 7
RegularizedSelfPlay/Llama-3-8B-Instruct-SPPO-Iter1-gp-8b-gpm-reg0.05-sppo-reversekl-table Text Generation • 8B • Updated Jul 30 • 5
RegularizedSelfPlay/Llama-3-8B-Instruct-SPPO-Iter2-gp-8b-gpm-reg0.1-sppo-forwardimportance10-table Text Generation • 8B • Updated Jul 30 • 5
RegularizedSelfPlay/Llama-3-8B-Instruct-SPPO-Iter3-gp-8b-gpm-reg0.5-sppo-reversekl-table Text Generation • 8B • Updated Jul 30 • 5
RegularizedSelfPlay/Llama-3-8B-Instruct-SPPO-Iter1-gp-8b-gpm-reg0.5-sppo-reversekl-table Text Generation • 8B • Updated Jul 30 • 5