Training & test sets and finetuned models
AI & ML interests
Workflow of Reinforcement Learning from Human Feedback (RLHF). Blog: https://rlhflow.github.io/
Recent Activity
Papers
View all Papers
This is the collection of the online-DPO-R1 project.
This is a collection of datasets and models of process reward modeling.
The mixture of preference datasets used for reward modeling.
This is a collection of materials for training pairwise preference model.
Reward models trained by RLHFlow codebase (https://github.com/RLHFlow/RLHF-Reward-Modeling/)
-
RLHFlow/ArmoRM-Llama3-8B-v0.1
Text Classification • 8B • Updated • 11.1k • 182 -
RLHFlow/pair-preference-model-LLaMA3-8B
Text Generation • 8B • Updated • 8 • • 38 -
sfairXC/FsfairX-LLaMA3-RM-v0.1
Text Classification • 8B • Updated • 1.19k • 60 -
RLHF Workflow: From Reward Modeling to Online RLHF
Paper • 2405.07863 • Published • 71
We collect the open-source datasets and process them into the standard format.
We train the reward model as the maximum likelihood estimation of the Bradley-Terry model.
Datasets, code, and models for online RLHF (i.e., iterative DPO)
We train a series of SFT models on the high-quality SFT dataset of RLHFlow for research purpose.
Training & test sets and finetuned models
This is the collection of the online-DPO-R1 project.
This is a collection of datasets and models of process reward modeling.
We collect the open-source datasets and process them into the standard format.
The mixture of preference datasets used for reward modeling.
We train the reward model as the maximum likelihood estimation of the Bradley-Terry model.
This is a collection of materials for training pairwise preference model.
Datasets, code, and models for online RLHF (i.e., iterative DPO)
Reward models trained by RLHFlow codebase (https://github.com/RLHFlow/RLHF-Reward-Modeling/)
-
RLHFlow/ArmoRM-Llama3-8B-v0.1
Text Classification • 8B • Updated • 11.1k • 182 -
RLHFlow/pair-preference-model-LLaMA3-8B
Text Generation • 8B • Updated • 8 • • 38 -
sfairXC/FsfairX-LLaMA3-RM-v0.1
Text Classification • 8B • Updated • 1.19k • 60 -
RLHF Workflow: From Reward Modeling to Online RLHF
Paper • 2405.07863 • Published • 71
We train a series of SFT models on the high-quality SFT dataset of RLHFlow for research purpose.