File size: 1,263 Bytes
12ea417 69792eb 12ea417 69792eb 12ea417 69792eb 12ea417 69792eb 12ea417 69792eb 12ea417 69792eb 12ea417 69792eb 12ea417 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
---
base_model: Qwen/Qwen2.5-VL-7B-Instruct
library_name: transformers
model_name: ob11/Qwen-VL-PRM-7B
licence: apache-2.0
datasets:
- ob11/VL-PRM300K-V1-train
---
# Model Summary
> Qwen-VL-PRM-3B is a process reward model finetuned from Qwen2.5-7B-Instruct on approximately 300,000 examples. It shows strong test-time scaling performance improvements on various advanced multimodal reasoning benchmarks despite being trained mainly on abstract reasoning datasets and elementary reasoning datasets.
- **Logs:** https://wandb.ai/aisg-arf/multimodal-reasoning/runs/pj4oc0qh
- **Repository:** [ob11/vlprm](https://github.com/theogbrand/vlprm/)
- **Paper:** https://arxiv.org/abs/
# Use
The model usage is documented [here](https://github.com/theogbrand/vlprm/blob/main/eval/tts_eval/reward_guided_search/VisualPRMv2.py).
### Framework versions
- TRL: 0.19.1
- Transformers: 4.55.3
- Pytorch: 2.7.1
- Datasets: 3.0.1
- Tokenizers: 0.21.4
## Citations
```bibtex
@misc{ong2025vlprms,
title={VL-PRMs: Vision-Language Process Reward Models},
author={Brandon Ong, Tej Deep Pala, Vernon Toh, William Chandra Tjhi and Soujanya Poria},
year={2025},
eprint={},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={},
}
``` |