Qwen-2.5-3B Reward Model

This is a 3B reward model fine-tuned from Qwen 2.5 3B using Anthropic HH-RLHF data.
It is designed to score model outputs for alignment and quality, and can be used with RewardBench for evaluation.

Eval Results (RewardBench)

Category Score
Chat 83.5%
Chat Hard 53.2%
Safety 72.2%
Reasoning 73.4%

Sub-benchmarks

  • alpacaeval-easy: 0.82
  • alpacaeval-hard: 0.874
  • hep-python: 0.835
  • mt-bench-easy: 0.893
  • refusals-offensive: 0.91

Usage

from transformers import AutoModelForSequenceClassification, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("kanishkez/Reward-Model")
model = AutoModelForSequenceClassification.from_pretrained("kanishkez/Reward-Model")
Downloads last month
98
Safetensors
Model size
3B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for kanishkez/Reward-Model

Base model

Qwen/Qwen2.5-3B
Finetuned
(307)
this model

Dataset used to train kanishkez/Reward-Model