File size: 1,263 Bytes
12ea417
 
 
69792eb
 
 
 
12ea417
 
69792eb
12ea417
69792eb
12ea417
69792eb
 
 
12ea417
69792eb
12ea417
69792eb
12ea417
 
 
 
 
 
 
 
 
 
 
 
69792eb
 
 
 
 
 
 
 
12ea417
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
---
base_model: Qwen/Qwen2.5-VL-7B-Instruct
library_name: transformers
model_name: ob11/Qwen-VL-PRM-7B
licence: apache-2.0
datasets:
- ob11/VL-PRM300K-V1-train
---

# Model Summary

> Qwen-VL-PRM-3B is a process reward model finetuned from Qwen2.5-7B-Instruct on approximately 300,000 examples. It shows strong test-time scaling performance improvements on various advanced multimodal reasoning benchmarks despite being trained mainly on abstract reasoning datasets and elementary reasoning datasets.

- **Logs:** https://wandb.ai/aisg-arf/multimodal-reasoning/runs/pj4oc0qh
- **Repository:** [ob11/vlprm](https://github.com/theogbrand/vlprm/)
- **Paper:** https://arxiv.org/abs/

# Use

The model usage is documented [here](https://github.com/theogbrand/vlprm/blob/main/eval/tts_eval/reward_guided_search/VisualPRMv2.py).

### Framework versions

- TRL: 0.19.1
- Transformers: 4.55.3
- Pytorch: 2.7.1
- Datasets: 3.0.1
- Tokenizers: 0.21.4

## Citations
    
```bibtex
@misc{ong2025vlprms,
      title={VL-PRMs: Vision-Language Process Reward Models}, 
      author={Brandon Ong, Tej Deep Pala, Vernon Toh, William Chandra Tjhi and Soujanya Poria},
      year={2025},
      eprint={},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={}, 
}
```