starriver030515 commited on
Commit
28a2e7c
Β·
verified Β·
1 Parent(s): 78f6487

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +141 -0
README.md ADDED
@@ -0,0 +1,141 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ base_model:
6
+ - Qwen/Qwen3-VL-8B-Instruct
7
+ pipeline_tag: image-text-to-text
8
+ library_name: transformers
9
+ tags:
10
+ - chart
11
+ - reasoning
12
+ - vision-language
13
+ - multimodal
14
+ - chart-understanding
15
+ - VLM
16
+ - SOTA
17
+ datasets:
18
+ - opendatalab/ChartVerse-SFT-600K
19
+ - opendatalab/ChartVerse-RL-40K
20
+ ---
21
+
22
+ **ChartVerse-8B** is a state-of-the-art Vision Language Model (VLM) achieving top-tier performance on chart reasoning benchmarks, developed as part of the **[opendatalab/ChartVerse](https://huggingface.co/collections/opendatalab/chartverse)** project. For more details about our method, datasets, and full model series, please visit our [GitHub Repository](https://github.com/starriver030515/ChartVerse) and [Project Page](https://chartverse.github.io).
23
+
24
+ Most notably, **ChartVerse-8B surpasses its teacher model Qwen3-VL-30B-A3B-Thinking (62.9%) and approaches Qwen3-VL-32B-Thinking (67.0%)**, breaking the distillation ceiling and demonstrating that high-quality synthetic data can enable student models to exceed their teachers.
25
+
26
+ ## πŸ”₯ Highlights
27
+
28
+ - **πŸ† SOTA Performance**: 64.1% average score across 6 challenging chart benchmarks
29
+ - **πŸ“ˆ Surpasses Teacher**: Outperforms Qwen3-VL-30B-A3B-Thinking (62.9%) with only 8B parameters
30
+ - **🎯 Approaches 32B**: Rivals Qwen3-VL-32B-Thinking (67.0%) performance
31
+
32
+ ## πŸ“Š Model Performance
33
+
34
+ ### Overall Results
35
+
36
+ <div align="center">
37
+ <img src="https://raw.githubusercontent.com/chartverse/chartverse.github.io/main/static/images/overall_result.png" width="100%" alt="Overall Performance Comparison">
38
+ </div>
39
+
40
+ ### SFT vs RL Performance
41
+
42
+ <div align="center">
43
+ <img src="https://raw.githubusercontent.com/chartverse/chartverse.github.io/main/static/images/training_phases.png" width="100%" alt="Training Phases Performance">
44
+ </div>
45
+
46
+ ## πŸ“š Training Data
47
+
48
+ ### [ChartVerse-SFT-600K](https://huggingface.co/datasets/opendatalab/ChartVerse-SFT-600K)
49
+ - **412K** unique high-complexity charts
50
+ - **603K** QA pairs with **3.9B** tokens of CoT reasoning
51
+ - Rollout Posterior Entropy: **0.44** (highest among all datasets)
52
+ - Truth-anchored answer verification via code execution
53
+
54
+ ### [ChartVerse-RL-40K](https://huggingface.co/datasets/opendatalab/ChartVerse-RL-40K)
55
+ - **40K** highest-difficulty samples
56
+ - Filtered by failure rate: 0 < r(Q) < 1
57
+ - Ensures "hard but solvable" training signal
58
+
59
+ ## πŸ‹οΈ Training Details
60
+
61
+ **Supervised Fine-Tuning (SFT)**:
62
+ - Framework: LLaMA-Factory
63
+ - Dataset: ChartVerse-SFT-600K
64
+ - Learning rate: 1.0 Γ— 10⁻⁡
65
+ - Global batch size: 128
66
+ - Context length: 22,000 tokens
67
+ - Training time: ~1.5 days on 32Γ— A100 GPUs
68
+
69
+ **Reinforcement Learning (RL)**:
70
+ - Framework: veRL
71
+ - Dataset: ChartVerse-RL-40K
72
+ - Algorithm: GSPO
73
+ - Learning rate: 1.0 Γ— 10⁻⁢
74
+ - Rollout samples: 16 per prompt
75
+ - Training time: ~4 days on 32Γ— A100 GPUs
76
+
77
+ ## πŸš€ Quick Start
78
+
79
+ ```python
80
+ from transformers import Qwen3VLForConditionalGeneration, AutoProcessor
81
+ from qwen_vl_utils import process_vision_info
82
+ from PIL import Image
83
+
84
+ # 1. Load Model
85
+ model_path = "opendatalab/ChartVerse-8B"
86
+ model = Qwen3VLForConditionalGeneration.from_pretrained(
87
+ model_path, torch_dtype="auto", device_map="auto"
88
+ )
89
+ processor = AutoProcessor.from_pretrained(model_path)
90
+
91
+ # 2. Prepare Input
92
+ image_path = "path/to/your/chart.png"
93
+ query = "Which region demonstrates the greatest proportional variation in annual revenue compared to its typical revenue level?"
94
+
95
+ messages = [
96
+ {
97
+ "role": "user",
98
+ "content": [
99
+ {"type": "image", "image": image_path},
100
+ {"type": "text", "text": query},
101
+ ],
102
+ }
103
+ ]
104
+
105
+ # 3. Inference
106
+ text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
107
+ image_inputs, video_inputs = process_vision_info(messages)
108
+ inputs = processor(
109
+ text=[text],
110
+ images=image_inputs,
111
+ padding=True,
112
+ return_tensors="pt",
113
+ ).to("cuda")
114
+ generated_ids = model.generate(**inputs, max_new_tokens=16384)
115
+ output_text = processor.batch_decode(
116
+ generated_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False
117
+ )
118
+ print(output_text[0])
119
+ ```
120
+
121
+ ## πŸ“– Citation
122
+
123
+ ```bibtex
124
+ @article{chartverse2026,
125
+ title={ChartVerse: Scaling Chart Reasoning via Reliable Programmatic Synthesis from Scratch},
126
+ author={Anonymous Authors},
127
+ journal={Anonymous ACL Submission},
128
+ year={2026}
129
+ }
130
+ ```
131
+
132
+ ## πŸ“„ License
133
+
134
+ This model is released under the Apache 2.0 License.
135
+
136
+ ## πŸ™ Acknowledgements
137
+
138
+ - Base model: [Qwen3-VL-8B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct)
139
+ - Teacher model: Qwen3-VL-30B-A3B-Thinking
140
+ - Training frameworks: [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory), [veRL](https://github.com/volcengine/verl)
141
+ - Evaluation: [VLMEvalKit](https://github.com/open-compass/VLMEvalKit), [Compass-Verifier](https://github.com/open-compass/CompassVerifier)