File size: 7,971 Bytes
fe9b97d 113064a b86034f 9b3c237 7954589 9b3c237 b86034f 993e4a1 9b75a8b c35bb3a 0f4ef87 af2b035 b86034f c73c46d 5c4d341 b86034f 113064a b86034f 113064a b86034f 113064a | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 | ---
base_model:
- Qwen/Qwen2.5-VL-7B-Instruct
datasets:
- TIGER-Lab/ViRL39K
license: mit
library_name: transformers
pipeline_tag: video-text-to-text
tags:
- lvlm
- reasoning
- multimodal
- qwen
---
<p align="center">
<img src="https://cdn-uploads.huggingface.co/production/uploads/63859cf3b2906edaf83af9f0/FGS454laRCGTIAzgrbGdG.png" alt="logo" width="200">
</p>
# Spark-VL-7B
โญ If you find our code or model helpful, please consider giving us a star โ your support means a lot!
๐ Try our demo on ๐ค<a href="https://huggingface.co/spaces/yuhangzang/spark">Huggingface Demo</a></h3>
๐ <a href="https://github.com/InternLM/Spark">Github repository</a>
๐<a href="https://huggingface.co/papers/2509.22624">Daily Paper</a>
๐ค<a href="https://huggingface.co/internlm/Spark-VL-7B">models</a>
๐<a href="https://arxiv.org/abs/2509.22624">Paper</a>
## Introduction
We propose **SPARK**, **a unified framework that integrates policy and reward into a single model for joint and synchronous training**. SPARK can automatically derive reward and reflection data from verifiable reward, enabling **self-learning** and **self-evolution**. Furthermore, we instantiate this framework on multiple backbones, training SPARK-VL-7B, SPARK-7B, and SPARK-VL-32B. This repo is the **SPARK-VL-7B**.
## ๐ข News
- ๐ [09/29/2025] We release our ๐ค<a href="https://huggingface.co/datasets/internlm/Spark-Data">datasets</a>.
- ๐ [09/29/2025] We release our **Spark's** ๐<a href="https://arxiv.org/abs/2509.22624">Paper</a>.
- ๐ [09/29/2025] We upload our evaluation code and ๐ค<a href="https://huggingface.co/internlm/Spark-VL-7B">models</a>.
- ๐ [09/29/2025] We release **Spark** ๐ <a href="https://github.com/InternLM/Spark">Github repository</a>.
## ๐ก Highlights
- ๐ฅ **Synergistic PolicyโReward Co-Evolving (SPARK)**: We introduce SPARK, a unified reinforcement fine-tuning framework that jointly optimizes policy and reward within a single model through on-policy co-evolution..
- ๐ฅ **Recycling Rollouts**: Unlike conventional RL pipelines that discard rollouts after policy updates, SPARK recycles RLVR rollouts into pointwise, pairwise, and reflection objectives, enabling the model itself to act as both a strong policy and a generative reward model.
- ๐ฅ **Co-Evolving Mechanism**: Improved reward accuracy provides better gradients for policy learning, while stronger reasoning further refines reward judgment, forming a positive feedback loop that enhances reasoning, judgment, and reflection in synergy.
- ๐ฅ **Efficient and Practical**: SPARK requires no human preference data, teacher models, or external reward models, making it significantly more data- and compute-efficient than traditional RM-based RL pipelines.
## ๐ ๏ธ Usage
### ๐ค Using Transformers
Our model is based on Qwen2.5-VL-7B-Instruct. You can use the same code as the Qwen2.5-VL-7B-Instruct model for inference, referring to <a href="https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct">๐คHuggingface</a>.
```python
from transformers import Qwen2_5_VLForConditionalGeneration, AutoTokenizer, AutoProcessor
from qwen_vl_utils import process_vision_info
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
"internlm/Spark-VL-7B",
torch_dtype=torch.bfloat16,
attn_implementation="flash_attention_2",
device_map="auto",
)
processor = AutoProcessor.from_pretrained("internlm/Spark-VL-7B")
messages = [
{
"role": "user",
"content": [
{
"type": "image",
"image": image_path,
},
{"type": "text", "text": prompt},
],
}
]
# Preparation for inference
text = processor.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True
)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(
text=[text],
images=image_inputs,
videos=video_inputs,
padding=True,
return_tensors="pt",
)
inputs = inputs.to("cuda")
# Inference: Generation of the output
generated_ids = model.generate(**inputs, max_new_tokens=128)
generated_ids_trimmed = [
out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text)
```
### ๐ฆ Using vLLM
We recommend using **vLLM** for faster inference speed. Using vLLM leads to significant speed improvements in dataset evaluation.
```bash
PORT=8019
N_PROC=256
SERVE_NAME=spark_vl_7b
MODEL_PATH=/internlm/Spark-VL-7B
CUDA_VISIBLE_DEVICES=0,1,2,3 vllm serve "$MODEL_PATH" \
--tensor-parallel-size 4 \
--served-model-name $SERVE_NAME \
--port $PORT \
--max-num-seqs $N_PROC
```
## Training
### Spark Training
After downloading the dataset, you can start training using the following example bash script. Our bash scripts are in ```/Spark/Lmm_XC/XC/scripts/spark_training```
You need to modify the dataset paths and model paths to your own locations.
```
export WORKSPACE_DIR="/fs-computility/....../Lmm_XC" # Path to project root directory
export DATASET_PATH="/fs-computility/....../infer_data_ViRL_19k.json" # Path to your dataset
export PRETRAIN_MODEL_PATH="/fs-computility/....../Qwen2.5-VL-7B-Instruct" # Path to pretrained model
export WANDB_PROJECT="Observation" # Name for this project
export MODEL_CPK_NAME="Qwen2.5-VL-7B-GRPO-virl-19k-iar-reflection-hyb-diverse-bs64-e2" # Name for this training run
export LOG_PATH='/fs-computility/....../Qwen2.5-VL-7B-GRPO-virl-19k-iar-reflection-hyb-diverse-bs64-e2.txt' #Log file save path
export WANDB_API_KEY="......"
export SAVE_PATH="/fs-computility/....../${WANDB_PROJECT}/${MODEL_CPK_NAME}" # Absolute path to save everything about this training run
export CKPT_PATH="${SAVE_PATH}/ckpt" # Path to save checkpoints
export FINAL_CKPT_PATH="${SAVE_PATH}/final_ckpt" # Path to save final checkpoints
export TIMESTAMP=$(date +%Y%m%d_%H%M%S) # Timestamp
export CUR_LOG_DIR="${SAVE_PATH}/training_logs/${TIMESTAMP}" # Path to save current run logs
export LOG_DIR="${SAVE_PATH}/tb_logs"
```
โฐ Attention:
```
export DEV_MODE=0 # Set to 1 for debug mode on single dev machine
```
## Evaluation
The integrated multimodal mathematics dataset can be downloaded from ๐ค<a href="https://huggingface.co/datasets/internlm/Spark-Data">datasets</a> and evaluated using the scripts provided in the `Evaluation` folder. The evaluation results will be stored, and accuracy can subsequently be computed with the `calculate_acc.py` file.
```
bash ./Evaluation/eval_spark_vl_7b.sh
python calculate_acc.py --result_path ./your_result_path.json
```
## โ๏ธCitation
```bibtex
@article{liu2025spark,
title={SPARK: Synergistic Policy And Reward Co-Evolving Framework},
author={Ziyu Liu and Yuhang Zang and Shengyuan Ding and Yuhang Cao and Xiaoyi Dong and Haodong Duan and Dahua Lin and Jiaqi Wang},
journal={arXiv preprint arXiv:2509.22624},
year={2025}
}
```
## ๐ License
  **Usage and License Notices**: The data and code are intended and licensed for research use only.
License: Attribution-NonCommercial 4.0 International It should abide by the policy of OpenAI: https://openai.com/policies/terms-of-use
## Acknowledgement
We sincerely thank projects <a href="https://github.com/TideDra/lmm-r1">lmm-r1</a> and <a href="https://github.com/OpenRLHF/OpenRLHF">OpenRLHF</a> for providing their open-source resources. |