--- base_model: Qwen/Qwen2.5-VL-7B-Instruct library_name: transformers model_name: ob11/Qwen-VL-PRM-7B licence: apache-2.0 datasets: - ob11/VL-PRM300K-V1-train --- # Model Summary > Qwen-VL-PRM-3B is a process reward model finetuned from Qwen2.5-7B-Instruct on approximately 300,000 examples. It shows strong test-time scaling performance improvements on various advanced multimodal reasoning benchmarks despite being trained mainly on abstract reasoning datasets and elementary reasoning datasets. - **Logs:** https://wandb.ai/aisg-arf/multimodal-reasoning/runs/pj4oc0qh - **Repository:** [ob11/vlprm](https://github.com/theogbrand/vlprm/) - **Paper:** https://arxiv.org/abs/ # Use The model usage is documented [here](https://github.com/theogbrand/vlprm/blob/main/eval/tts_eval/reward_guided_search/VisualPRMv2.py). ### Framework versions - TRL: 0.19.1 - Transformers: 4.55.3 - Pytorch: 2.7.1 - Datasets: 3.0.1 - Tokenizers: 0.21.4 ## Citations ```bibtex @misc{ong2025vlprms, title={VL-PRMs: Vision-Language Process Reward Models}, author={Brandon Ong, Tej Deep Pala, Vernon Toh, William Chandra Tjhi and Soujanya Poria}, year={2025}, eprint={}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={}, } ```