File size: 7,971 Bytes
fe9b97d
 
 
113064a
 
 
 
 
 
 
 
 
 
b86034f
 
9b3c237
 
 
 
7954589
9b3c237
b86034f
993e4a1
 
 
9b75a8b
c35bb3a
0f4ef87
af2b035
b86034f
 
 
 
 
 
c73c46d
5c4d341
b86034f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
113064a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b86034f
113064a
 
 
 
 
 
 
b86034f
113064a
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
---
base_model:
- Qwen/Qwen2.5-VL-7B-Instruct
datasets:
- TIGER-Lab/ViRL39K
license: mit
library_name: transformers
pipeline_tag: video-text-to-text
tags:
- lvlm
- reasoning
- multimodal
- qwen
---

<p align="center">
  <img src="https://cdn-uploads.huggingface.co/production/uploads/63859cf3b2906edaf83af9f0/FGS454laRCGTIAzgrbGdG.png" alt="logo" width="200">
</p>

# Spark-VL-7B

โญ If you find our code or model helpful, please consider giving us a star โ€” your support means a lot!

๐ŸŒˆ Try our demo on ๐Ÿค—<a href="https://huggingface.co/spaces/yuhangzang/spark">Huggingface Demo</a></h3>

๐Ÿ <a href="https://github.com/InternLM/Spark">Github repository</a>
๐Ÿ“–<a href="https://huggingface.co/papers/2509.22624">Daily Paper</a>
๐Ÿค—<a href="https://huggingface.co/internlm/Spark-VL-7B">models</a>
๐Ÿ“–<a href="https://arxiv.org/abs/2509.22624">Paper</a>

## Introduction

We propose **SPARK**, **a unified framework that integrates policy and reward into a single model for joint and synchronous training**. SPARK can automatically derive reward and reflection data from verifiable reward, enabling **self-learning** and **self-evolution**. Furthermore, we instantiate this framework on multiple backbones, training SPARK-VL-7B, SPARK-7B, and SPARK-VL-32B. This repo is the **SPARK-VL-7B**.

## ๐Ÿ“ข News
- ๐Ÿš€ [09/29/2025] We release our ๐Ÿค—<a href="https://huggingface.co/datasets/internlm/Spark-Data">datasets</a>.
- ๐Ÿš€ [09/29/2025] We release our **Spark's** ๐Ÿ“–<a href="https://arxiv.org/abs/2509.22624">Paper</a>.
- ๐Ÿš€ [09/29/2025] We upload our evaluation code and ๐Ÿค—<a href="https://huggingface.co/internlm/Spark-VL-7B">models</a>.
- ๐Ÿš€ [09/29/2025] We release **Spark** ๐Ÿ <a href="https://github.com/InternLM/Spark">Github repository</a>.

## ๐Ÿ’ก Highlights
- ๐Ÿ”ฅ **Synergistic Policyโ€“Reward Co-Evolving (SPARK)**: We introduce SPARK, a unified reinforcement fine-tuning framework that jointly optimizes policy and reward within a single model through on-policy co-evolution..
- ๐Ÿ”ฅ **Recycling Rollouts**: Unlike conventional RL pipelines that discard rollouts after policy updates, SPARK recycles RLVR rollouts into pointwise, pairwise, and reflection objectives, enabling the model itself to act as both a strong policy and a generative reward model.
- ๐Ÿ”ฅ **Co-Evolving Mechanism**: Improved reward accuracy provides better gradients for policy learning, while stronger reasoning further refines reward judgment, forming a positive feedback loop that enhances reasoning, judgment, and reflection in synergy.
- ๐Ÿ”ฅ **Efficient and Practical**: SPARK requires no human preference data, teacher models, or external reward models, making it significantly more data- and compute-efficient than traditional RM-based RL pipelines.

## ๐Ÿ› ๏ธ Usage
### ๐Ÿค— Using Transformers

Our model is based on Qwen2.5-VL-7B-Instruct. You can use the same code as the Qwen2.5-VL-7B-Instruct model for inference, referring to <a href="https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct">๐Ÿค—Huggingface</a>.
```python
from transformers import Qwen2_5_VLForConditionalGeneration, AutoTokenizer, AutoProcessor
from qwen_vl_utils import process_vision_info

model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    "internlm/Spark-VL-7B",
    torch_dtype=torch.bfloat16,
    attn_implementation="flash_attention_2",
    device_map="auto",
)

processor = AutoProcessor.from_pretrained("internlm/Spark-VL-7B")

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image",
                "image": image_path,
            },
            {"type": "text", "text": prompt},
        ],
    }
]

# Preparation for inference
text = processor.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(
    text=[text],
    images=image_inputs,
    videos=video_inputs,
    padding=True,
    return_tensors="pt",
)
inputs = inputs.to("cuda")

# Inference: Generation of the output
generated_ids = model.generate(**inputs, max_new_tokens=128)
generated_ids_trimmed = [
    out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
    generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text)
```

### ๐Ÿ”ฆ Using vLLM

We recommend using **vLLM** for faster inference speed. Using vLLM leads to significant speed improvements in dataset evaluation.
```bash
PORT=8019
N_PROC=256
SERVE_NAME=spark_vl_7b
MODEL_PATH=/internlm/Spark-VL-7B

CUDA_VISIBLE_DEVICES=0,1,2,3 vllm serve "$MODEL_PATH" \
  --tensor-parallel-size 4 \
  --served-model-name $SERVE_NAME \
  --port $PORT \
  --max-num-seqs $N_PROC
```


## Training

### Spark Training 
After downloading the dataset, you can start training using the following example bash script. Our bash scripts are in ```/Spark/Lmm_XC/XC/scripts/spark_training```
You need to modify the dataset paths and model paths to your own locations.
```
export WORKSPACE_DIR="/fs-computility/....../Lmm_XC"                 # Path to project root directory
export DATASET_PATH="/fs-computility/....../infer_data_ViRL_19k.json"            # Path to your dataset
export PRETRAIN_MODEL_PATH="/fs-computility/....../Qwen2.5-VL-7B-Instruct"  # Path to pretrained model
export WANDB_PROJECT="Observation"        # Name for this project
export MODEL_CPK_NAME="Qwen2.5-VL-7B-GRPO-virl-19k-iar-reflection-hyb-diverse-bs64-e2"         # Name for this training run
export LOG_PATH='/fs-computility/....../Qwen2.5-VL-7B-GRPO-virl-19k-iar-reflection-hyb-diverse-bs64-e2.txt'      #Log file save path


export WANDB_API_KEY="......"
export SAVE_PATH="/fs-computility/....../${WANDB_PROJECT}/${MODEL_CPK_NAME}"                   # Absolute path to save everything about this training run
export CKPT_PATH="${SAVE_PATH}/ckpt"                                                                    # Path to save checkpoints                                    
export FINAL_CKPT_PATH="${SAVE_PATH}/final_ckpt"                                                        # Path to save final checkpoints
export TIMESTAMP=$(date +%Y%m%d_%H%M%S)                                                                 # Timestamp
export CUR_LOG_DIR="${SAVE_PATH}/training_logs/${TIMESTAMP}"                                            # Path to save current run logs
export LOG_DIR="${SAVE_PATH}/tb_logs"  
```
โฐ Attention:
```
export DEV_MODE=0 # Set to 1 for debug mode on single dev machine
```

## Evaluation
The integrated multimodal mathematics dataset can be downloaded from ๐Ÿค—<a href="https://huggingface.co/datasets/internlm/Spark-Data">datasets</a> and evaluated using the scripts provided in the `Evaluation` folder. The evaluation results will be stored, and accuracy can subsequently be computed with the `calculate_acc.py` file.
```
bash ./Evaluation/eval_spark_vl_7b.sh
python calculate_acc.py --result_path ./your_result_path.json
```

## โœ’๏ธCitation
```bibtex
@article{liu2025spark,
  title={SPARK: Synergistic Policy And Reward Co-Evolving Framework},
  author={Ziyu Liu and Yuhang Zang and Shengyuan Ding and Yuhang Cao and Xiaoyi Dong and Haodong Duan and Dahua Lin and Jiaqi Wang},
  journal={arXiv preprint arXiv:2509.22624},
  year={2025}
}
```

## ๐Ÿ“„ License
![Code License](https://img.shields.io/badge/Code%20License-Apache_2.0-green.svg) ![Data License](https://img.shields.io/badge/Data%20License-CC%20By%20NC%204.0-red.svg) **Usage and License Notices**: The data and code are intended and licensed for research use only.
License: Attribution-NonCommercial 4.0 International It should abide by the policy of OpenAI: https://openai.com/policies/terms-of-use

## Acknowledgement
We sincerely thank projects <a href="https://github.com/TideDra/lmm-r1">lmm-r1</a> and <a href="https://github.com/OpenRLHF/OpenRLHF">OpenRLHF</a> for providing their open-source resources.