Safetensors
qwen2_5_vl
lkdhy's picture
Update README.md
871f7b0 verified
metadata
license: apache-2.0
datasets:
  - Code2Logic/GameQA-140K
base_model:
  - Qwen/Qwen2.5-VL-7B-Instruct

This model (GameQA-Qwen2.5-VL-7B) results from training Qwen2.5-VL-7B with GRPO solely on our GameQA dataset.

Evaluation Results on General Vision BenchMarks

It's also found that getting trained on 5k samples from our GameQA dataset can lead to better results than on multimodal-open-r1-8k-verified.

Code2Logic: Game-Code-Driven Data Synthesis for Enhancing VLMs General Reasoning

This is the first work, to the best of our knowledge, that leverages game code to synthesize multimodal reasoning data for training VLMs. Furthermore, when trained with a GRPO strategy solely on GameQA (synthesized via our proposed Code2Logic approach), multiple cutting-edge open-source models exhibit significantly enhanced out-of-domain generalization.

[πŸ“– Paper] [πŸ€— GameQA-140K Dataset] [πŸ€— GameQA-InternVL3-8B ] [πŸ€— GameQA-Qwen2.5-VL-7B] [πŸ€— GameQA-LLaVA-OV-7B ]

News

  • We've open-sourced the three models trained with GRPO on GameQA on Huggingface.