Qwen-VL-PRM-7B / README.md
ob11's picture
Update README.md
69792eb verified
|
raw
history blame
1.26 kB
metadata
base_model: Qwen/Qwen2.5-VL-7B-Instruct
library_name: transformers
model_name: ob11/Qwen-VL-PRM-7B
licence: apache-2.0
datasets:
  - ob11/VL-PRM300K-V1-train

Model Summary

Qwen-VL-PRM-3B is a process reward model finetuned from Qwen2.5-7B-Instruct on approximately 300,000 examples. It shows strong test-time scaling performance improvements on various advanced multimodal reasoning benchmarks despite being trained mainly on abstract reasoning datasets and elementary reasoning datasets.

Use

The model usage is documented here.

Framework versions

  • TRL: 0.19.1
  • Transformers: 4.55.3
  • Pytorch: 2.7.1
  • Datasets: 3.0.1
  • Tokenizers: 0.21.4

Citations

@misc{ong2025vlprms,
      title={VL-PRMs: Vision-Language Process Reward Models}, 
      author={Brandon Ong, Tej Deep Pala, Vernon Toh, William Chandra Tjhi and Soujanya Poria},
      year={2025},
      eprint={},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={}, 
}