yuhangzang commited on
Commit
b86034f
·
verified ·
1 Parent(s): fe9b97d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +96 -1
README.md CHANGED
@@ -4,4 +4,99 @@ datasets:
4
  - TIGER-Lab/ViRL39K
5
  base_model:
6
  - Qwen/Qwen2.5-VL-7B-Instruct
7
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  - TIGER-Lab/ViRL39K
5
  base_model:
6
  - Qwen/Qwen2.5-VL-7B-Instruct
7
+ ---
8
+
9
+ # Spark-VL-7B
10
+ ⭐ If you find our code or model helpful, please consider giving us a star — your support means a lot!
11
+
12
+ ## Introduction
13
+
14
+ We propose **SPARK**, **a unified framework that integrates policy and reward into a single model for joint and synchronous training**. SPARK can automatically derive reward and reflection data from verifiable reward, enabling **self-learning** and **self-evolution**. Furthermore, we instantiate this framework on multiple backbones, training SPARK-VL-7B, SPARK-7B, and SPARK-VL-32B. This repo is the **SPARK-VL-7B**.
15
+
16
+ ## 📢 News
17
+ - 🚀 [09/29/2025] We release our **Spark's** 📖<a href="https://arxiv.org/abs/2503.01785">Paper</a>.
18
+ - 🚀 [09/29/2025] We upload our evaluation code and 🤗<a href="https://huggingface.co/internlm/Spark-VL-7B">models</a>.
19
+ - 🚀 [09/29/2025] We release **Spark** 🏠<a href="https://github.com/InternLM/Spark">Github repository</a>.
20
+
21
+ ## 💡 Highlights
22
+ - 🔥 **Synergistic Policy–Reward Co-Evolving (SPARK)**: We introduce SPARK, a unified reinforcement fine-tuning framework that jointly optimizes policy and reward within a single model through on-policy co-evolution..
23
+ - 🔥 **Recycling Rollouts**: Unlike conventional RL pipelines that discard rollouts after policy updates, SPARK recycles RLVR rollouts into pointwise, pairwise, and reflection objectives, enabling the model itself to act as both a strong policy and a generative reward model.
24
+ - 🔥 **Co-Evolving Mechanism**: Improved reward accuracy provides better gradients for policy learning, while stronger reasoning further refines reward judgment, forming a positive feedback loop that enhances reasoning, judgment, and reflection in synergy.
25
+ - 🔥 **Efficient and Practical**: SPARK requires no human preference data, teacher models, or external reward models, making it significantly more data- and compute-efficient than traditional RM-based RL pipelines.
26
+
27
+ ## 🛠️ Usage
28
+ ### 🤗 Using Transformers
29
+
30
+ Our model is based on Qwen2.5-VL-7B-Instruct. You can use the same code as the Qwen2.5-VL-7B-Instruct model for inference, referring to <a href="https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct">🤗Huggingface</a>.
31
+ ```python
32
+ from transformers import Qwen2_5_VLForConditionalGeneration, AutoTokenizer, AutoProcessor
33
+ from qwen_vl_utils import process_vision_info
34
+
35
+ model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
36
+ "internlm/Spark-VL-7B",
37
+ torch_dtype=torch.bfloat16,
38
+ attn_implementation="flash_attention_2",
39
+ device_map="auto",
40
+ )
41
+
42
+ processor = AutoProcessor.from_pretrained("internlm/Spark-VL-7B")
43
+
44
+ messages = [
45
+ {
46
+ "role": "user",
47
+ "content": [
48
+ {
49
+ "type": "image",
50
+ "image": image_path,
51
+ },
52
+ {"type": "text", "text": prompt},
53
+ ],
54
+ }
55
+ ]
56
+
57
+ # Preparation for inference
58
+ text = processor.apply_chat_template(
59
+ messages, tokenize=False, add_generation_prompt=True
60
+ )
61
+ image_inputs, video_inputs = process_vision_info(messages)
62
+ inputs = processor(
63
+ text=[text],
64
+ images=image_inputs,
65
+ videos=video_inputs,
66
+ padding=True,
67
+ return_tensors="pt",
68
+ )
69
+ inputs = inputs.to("cuda")
70
+
71
+ # Inference: Generation of the output
72
+ generated_ids = model.generate(**inputs, max_new_tokens=128)
73
+ generated_ids_trimmed = [
74
+ out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
75
+ ]
76
+ output_text = processor.batch_decode(
77
+ generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
78
+ )
79
+ print(output_text)
80
+ ```
81
+
82
+ ### 🔦 Using vLLM
83
+
84
+ We recommend using **vLLM** for faster inference speed. Using vLLM leads to significant speed improvements in dataset evaluation.
85
+ ```bash
86
+ PORT=8019
87
+ N_PROC=256
88
+ SERVE_NAME=spark_vl_7b
89
+ MODEL_PATH=/internlm/Spark-VL-7B
90
+
91
+ CUDA_VISIBLE_DEVICES=0,1,2,3 vllm serve "$MODEL_PATH" \
92
+ --tensor-parallel-size 4 \
93
+ --served-model-name $SERVE_NAME \
94
+ --port $PORT \
95
+ --max-num-seqs $N_PROC
96
+ ```
97
+
98
+
99
+ ## ✒️Citation
100
+ ```
101
+ TBD
102
+ ```