halfrot
/

sft-mt5-base

text2text-generation

Model card Files Files and versions

Trained SFT policy for MT task in the paper "ALaRM: Align Language Models via Hierarchical Rewards Modeling".

Check out our project page for more information.

Downloads last month: 6

Safetensors

Model size

0.6B params

Tensor type

F32

·

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train halfrot/sft-mt5-base

Paper for halfrot/sft-mt5-base

ALaRM: Align Language Models via Hierarchical Rewards Modeling

Paper • 2403.06754 • Published Mar 11, 2024