Update README.md
Browse files
    	
        README.md
    CHANGED
    
    | @@ -1,36 +1,23 @@ | |
| 1 | 
             
            ---
         | 
| 2 | 
             
            base_model: Qwen/Qwen2.5-VL-7B-Instruct
         | 
| 3 | 
             
            library_name: transformers
         | 
| 4 | 
            -
            model_name:  | 
| 5 | 
            -
             | 
| 6 | 
            -
             | 
| 7 | 
            -
            -  | 
| 8 | 
            -
            - trl
         | 
| 9 | 
            -
            licence: license
         | 
| 10 | 
             
            ---
         | 
| 11 |  | 
| 12 | 
            -
            # Model  | 
| 13 |  | 
| 14 | 
            -
             | 
| 15 | 
            -
            It has been trained using [TRL](https://github.com/huggingface/trl).
         | 
| 16 |  | 
| 17 | 
            -
             | 
|  | |
|  | |
| 18 |  | 
| 19 | 
            -
             | 
| 20 | 
            -
            from transformers import pipeline
         | 
| 21 |  | 
| 22 | 
            -
             | 
| 23 | 
            -
            generator = pipeline("text-generation", model="None", device="cuda")
         | 
| 24 | 
            -
            output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
         | 
| 25 | 
            -
            print(output["generated_text"])
         | 
| 26 | 
            -
            ```
         | 
| 27 | 
            -
             | 
| 28 | 
            -
            ## Training procedure
         | 
| 29 | 
            -
             | 
| 30 | 
            -
            [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/aisg-arf/multimodal-reasoning/runs/pj4oc0qh) 
         | 
| 31 | 
            -
             | 
| 32 | 
            -
             | 
| 33 | 
            -
            This model was trained with SFT.
         | 
| 34 |  | 
| 35 | 
             
            ### Framework versions
         | 
| 36 |  | 
| @@ -41,18 +28,15 @@ This model was trained with SFT. | |
| 41 | 
             
            - Tokenizers: 0.21.4
         | 
| 42 |  | 
| 43 | 
             
            ## Citations
         | 
| 44 | 
            -
             | 
| 45 | 
            -
             | 
| 46 | 
            -
             | 
| 47 | 
            -
            Cite TRL as:
         | 
| 48 |  | 
| 49 | 
             
            ```bibtex
         | 
| 50 | 
            -
            @misc{ | 
| 51 | 
            -
             | 
| 52 | 
            -
             | 
| 53 | 
            -
             | 
| 54 | 
            -
             | 
| 55 | 
            -
             | 
| 56 | 
            -
             | 
|  | |
| 57 | 
             
            }
         | 
| 58 | 
             
            ```
         | 
|  | |
| 1 | 
             
            ---
         | 
| 2 | 
             
            base_model: Qwen/Qwen2.5-VL-7B-Instruct
         | 
| 3 | 
             
            library_name: transformers
         | 
| 4 | 
            +
            model_name: ob11/Qwen-VL-PRM-7B
         | 
| 5 | 
            +
            licence: apache-2.0
         | 
| 6 | 
            +
            datasets:
         | 
| 7 | 
            +
            - ob11/VL-PRM300K-V1-train
         | 
|  | |
|  | |
| 8 | 
             
            ---
         | 
| 9 |  | 
| 10 | 
            +
            # Model Summary
         | 
| 11 |  | 
| 12 | 
            +
            > Qwen-VL-PRM-3B is a process reward model finetuned from Qwen2.5-7B-Instruct on approximately 300,000 examples. It shows strong test-time scaling performance improvements on various advanced multimodal reasoning benchmarks despite being trained mainly on abstract reasoning datasets and elementary reasoning datasets.
         | 
|  | |
| 13 |  | 
| 14 | 
            +
            - **Logs:** https://wandb.ai/aisg-arf/multimodal-reasoning/runs/pj4oc0qh
         | 
| 15 | 
            +
            - **Repository:** [ob11/vlprm](https://github.com/theogbrand/vlprm/)
         | 
| 16 | 
            +
            - **Paper:** https://arxiv.org/abs/
         | 
| 17 |  | 
| 18 | 
            +
            # Use
         | 
|  | |
| 19 |  | 
| 20 | 
            +
            The model usage is documented [here](https://github.com/theogbrand/vlprm/blob/main/eval/tts_eval/reward_guided_search/VisualPRMv2.py).
         | 
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
| 21 |  | 
| 22 | 
             
            ### Framework versions
         | 
| 23 |  | 
|  | |
| 28 | 
             
            - Tokenizers: 0.21.4
         | 
| 29 |  | 
| 30 | 
             
            ## Citations
         | 
|  | |
|  | |
|  | |
|  | |
| 31 |  | 
| 32 | 
             
            ```bibtex
         | 
| 33 | 
            +
            @misc{ong2025vlprms,
         | 
| 34 | 
            +
                  title={VL-PRMs: Vision-Language Process Reward Models}, 
         | 
| 35 | 
            +
                  author={Brandon Ong, Tej Deep Pala, Vernon Toh, William Chandra Tjhi and Soujanya Poria},
         | 
| 36 | 
            +
                  year={2025},
         | 
| 37 | 
            +
                  eprint={},
         | 
| 38 | 
            +
                  archivePrefix={arXiv},
         | 
| 39 | 
            +
                  primaryClass={cs.CL},
         | 
| 40 | 
            +
                  url={}, 
         | 
| 41 | 
             
            }
         | 
| 42 | 
             
            ```
         |