metadata
base_model: Qwen/Qwen2.5-VL-7B-Instruct
library_name: transformers
model_name: ob11/Qwen-VL-PRM-7B
licence: apache-2.0
datasets:
- ob11/VL-PRM300K-V1-train
Model Summary
Qwen-VL-PRM-3B is a process reward model finetuned from Qwen2.5-7B-Instruct on approximately 300,000 examples. It shows strong test-time scaling performance improvements on various advanced multimodal reasoning benchmarks despite being trained mainly on abstract reasoning datasets and elementary reasoning datasets.
- Logs: https://wandb.ai/aisg-arf/multimodal-reasoning/runs/pj4oc0qh
- Repository: ob11/vlprm
- Paper: https://arxiv.org/abs/
Use
The model usage is documented here.
Framework versions
- TRL: 0.19.1
- Transformers: 4.55.3
- Pytorch: 2.7.1
- Datasets: 3.0.1
- Tokenizers: 0.21.4
Citations
@misc{ong2025vlprms,
title={VL-PRMs: Vision-Language Process Reward Models},
author={Brandon Ong, Tej Deep Pala, Vernon Toh, William Chandra Tjhi and Soujanya Poria},
year={2025},
eprint={},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={},
}