--- base_model: - Qwen/Qwen2.5-VL-7B-Instruct datasets: - TIGER-Lab/ViRL39K license: mit library_name: transformers pipeline_tag: video-text-to-text tags: - lvlm - reasoning - multimodal - qwen ---

logo

# Spark-VL-7B ⭐ If you find our code or model helpful, please consider giving us a star β€” your support means a lot! 🏠Github repository πŸ“–Daily Paper πŸ€—models πŸ“–Paper ## Introduction We propose **SPARK**, **a unified framework that integrates policy and reward into a single model for joint and synchronous training**. SPARK can automatically derive reward and reflection data from verifiable reward, enabling **self-learning** and **self-evolution**. Furthermore, we instantiate this framework on multiple backbones, training SPARK-VL-7B, SPARK-7B, and SPARK-VL-32B. This repo is the **SPARK-VL-7B**. ## πŸ“’ News - πŸš€ [09/29/2025] We release our πŸ€—datasets. - πŸš€ [09/29/2025] We release our **Spark's** πŸ“–Paper. - πŸš€ [09/29/2025] We upload our evaluation code and πŸ€—models. - πŸš€ [09/29/2025] We release **Spark** 🏠Github repository. ## πŸ’‘ Highlights - πŸ”₯ **Synergistic Policy–Reward Co-Evolving (SPARK)**: We introduce SPARK, a unified reinforcement fine-tuning framework that jointly optimizes policy and reward within a single model through on-policy co-evolution.. - πŸ”₯ **Recycling Rollouts**: Unlike conventional RL pipelines that discard rollouts after policy updates, SPARK recycles RLVR rollouts into pointwise, pairwise, and reflection objectives, enabling the model itself to act as both a strong policy and a generative reward model. - πŸ”₯ **Co-Evolving Mechanism**: Improved reward accuracy provides better gradients for policy learning, while stronger reasoning further refines reward judgment, forming a positive feedback loop that enhances reasoning, judgment, and reflection in synergy. - πŸ”₯ **Efficient and Practical**: SPARK requires no human preference data, teacher models, or external reward models, making it significantly more data- and compute-efficient than traditional RM-based RL pipelines. ## πŸ› οΈ Usage ### πŸ€— Using Transformers Our model is based on Qwen2.5-VL-7B-Instruct. You can use the same code as the Qwen2.5-VL-7B-Instruct model for inference, referring to πŸ€—Huggingface. ```python from transformers import Qwen2_5_VLForConditionalGeneration, AutoTokenizer, AutoProcessor from qwen_vl_utils import process_vision_info model = Qwen2_5_VLForConditionalGeneration.from_pretrained( "internlm/Spark-VL-7B", torch_dtype=torch.bfloat16, attn_implementation="flash_attention_2", device_map="auto", ) processor = AutoProcessor.from_pretrained("internlm/Spark-VL-7B") messages = [ { "role": "user", "content": [ { "type": "image", "image": image_path, }, {"type": "text", "text": prompt}, ], } ] # Preparation for inference text = processor.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) image_inputs, video_inputs = process_vision_info(messages) inputs = processor( text=[text], images=image_inputs, videos=video_inputs, padding=True, return_tensors="pt", ) inputs = inputs.to("cuda") # Inference: Generation of the output generated_ids = model.generate(**inputs, max_new_tokens=128) generated_ids_trimmed = [ out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids) ] output_text = processor.batch_decode( generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False ) print(output_text) ``` ### πŸ”¦ Using vLLM We recommend using **vLLM** for faster inference speed. Using vLLM leads to significant speed improvements in dataset evaluation. ```bash PORT=8019 N_PROC=256 SERVE_NAME=spark_vl_7b MODEL_PATH=/internlm/Spark-VL-7B CUDA_VISIBLE_DEVICES=0,1,2,3 vllm serve "$MODEL_PATH" \ --tensor-parallel-size 4 \ --served-model-name $SERVE_NAME \ --port $PORT \ --max-num-seqs $N_PROC ``` ## Training ### Spark Training After downloading the dataset, you can start training using the following example bash script. Our bash scripts are in ```/Spark/Lmm_XC/XC/scripts/spark_training``` You need to modify the dataset paths and model paths to your own locations. ``` export WORKSPACE_DIR="/fs-computility/....../Lmm_XC" # Path to project root directory export DATASET_PATH="/fs-computility/....../infer_data_ViRL_19k.json" # Path to your dataset export PRETRAIN_MODEL_PATH="/fs-computility/....../Qwen2.5-VL-7B-Instruct" # Path to pretrained model export WANDB_PROJECT="Observation" # Name for this project export MODEL_CPK_NAME="Qwen2.5-VL-7B-GRPO-virl-19k-iar-reflection-hyb-diverse-bs64-e2" # Name for this training run export LOG_PATH='/fs-computility/....../Qwen2.5-VL-7B-GRPO-virl-19k-iar-reflection-hyb-diverse-bs64-e2.txt' #Log file save path export WANDB_API_KEY="......" export SAVE_PATH="/fs-computility/....../${WANDB_PROJECT}/${MODEL_CPK_NAME}" # Absolute path to save everything about this training run export CKPT_PATH="${SAVE_PATH}/ckpt" # Path to save checkpoints export FINAL_CKPT_PATH="${SAVE_PATH}/final_ckpt" # Path to save final checkpoints export TIMESTAMP=$(date +%Y%m%d_%H%M%S) # Timestamp export CUR_LOG_DIR="${SAVE_PATH}/training_logs/${TIMESTAMP}" # Path to save current run logs export LOG_DIR="${SAVE_PATH}/tb_logs" ``` ⏰ Attention: ``` export DEV_MODE=0 # Set to 1 for debug mode on single dev machine ``` ## Evaluation The integrated multimodal mathematics dataset can be downloaded from πŸ€—datasets and evaluated using the scripts provided in the `Evaluation` folder. The evaluation results will be stored, and accuracy can subsequently be computed with the `calculate_acc.py` file. ``` bash ./Evaluation/eval_spark_vl_7b.sh python calculate_acc.py --result_path ./your_result_path.json ``` ## βœ’οΈCitation ```bibtex @article{liu2025spark, title={SPARK: Synergistic Policy And Reward Co-Evolving Framework}, author={Ziyu Liu and Yuhang Zang and Shengyuan Ding and Yuhang Cao and Xiaoyi Dong and Haodong Duan and Dahua Lin and Jiaqi Wang}, journal={arXiv preprint arXiv:2509.22624}, year={2025} } ``` ## πŸ“„ License ![Code License](https://img.shields.io/badge/Code%20License-Apache_2.0-green.svg) ![Data License](https://img.shields.io/badge/Data%20License-CC%20By%20NC%204.0-red.svg) **Usage and License Notices**: The data and code are intended and licensed for research use only. License: Attribution-NonCommercial 4.0 International It should abide by the policy of OpenAI: https://openai.com/policies/terms-of-use ## Acknowledgement We sincerely thank projects lmm-r1 and OpenRLHF for providing their open-source resources.