Qwen-2.5-3B Reward Model
This is a 3B reward model fine-tuned from Qwen 2.5 3B using Anthropic HH-RLHF data.
It is designed to score model outputs for alignment and quality, and can be used with RewardBench for evaluation.
Eval Results (RewardBench)
| Category | Score |
|---|---|
| Chat | 83.5% |
| Chat Hard | 53.2% |
| Safety | 72.2% |
| Reasoning | 73.4% |
Sub-benchmarks
- alpacaeval-easy: 0.82
- alpacaeval-hard: 0.874
- hep-python: 0.835
- mt-bench-easy: 0.893
- refusals-offensive: 0.91
Usage
from transformers import AutoModelForSequenceClassification, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("kanishkez/Reward-Model")
model = AutoModelForSequenceClassification.from_pretrained("kanishkez/Reward-Model")
- Downloads last month
- 98
Model tree for kanishkez/Reward-Model
Base model
Qwen/Qwen2.5-3B