starriver030515 commited on
Commit
ecdaa5f
Β·
verified Β·
1 Parent(s): 6fc3aaa

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +138 -0
README.md ADDED
@@ -0,0 +1,138 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ base_model:
6
+ - Qwen/Qwen3-VL-4B-Instruct
7
+ pipeline_tag: image-text-to-text
8
+ library_name: transformers
9
+ tags:
10
+ - chart
11
+ - reasoning
12
+ - vision-language
13
+ - multimodal
14
+ - chart-understanding
15
+ - VLM
16
+ datasets:
17
+ - opendatalab/ChartVerse-SFT-600K
18
+ - opendatalab/ChartVerse-RL-40K
19
+ ---
20
+
21
+ **ChartVerse-4B** is an efficient Vision Language Model (VLM) specialized for complex chart reasoning, developed as part of the **[opendatalab/ChartVerse](https://huggingface.co/collections/opendatalab/chartverse)** project. For more details about our method, datasets, and full model series, please visit our [GitHub Repository](https://github.com/starriver030515/ChartVerse) and [Project Page](https://chartverse.github.io).
22
+
23
+ A key highlight is that **ChartVerse-4B significantly outperforms Qwen3-VL-8B-Thinking (60.0%) despite using only half the parameters**, demonstrating that data quality triumphs over model scale.
24
+
25
+ ## πŸ”₯ Highlights
26
+
27
+ - **Data Quality > Model Scale**: 4B parameters achieving 61.9% average score, surpassing Qwen3-VL-8B-Thinking (60.0%)
28
+ - **Efficient Performance**: Delivers 8B-level performance with 4B parameters
29
+ - **High-Quality Training**: Trained on ChartVerse-SFT-600K and ChartVerse-RL-40K with rigorous truth-anchored QA synthesis
30
+ - **Strong Reasoning**: Equipped with Chain-of-Thought reasoning for complex multi-step chart analysis
31
+
32
+ ## πŸ“Š Model Performance
33
+
34
+ ### Overall Results
35
+
36
+ <div align="center">
37
+ <img src="https://raw.githubusercontent.com/chartverse/chartverse.github.io/main/static/images/overall_result.png" width="100%" alt="Overall Performance Comparison">
38
+ </div>
39
+
40
+ ### SFT vs RL Performance
41
+
42
+ <div align="center">
43
+ <img src="https://raw.githubusercontent.com/chartverse/chartverse.github.io/main/static/images/training_phases.png" width="100%" alt="Training Phases Performance">
44
+ </div>
45
+
46
+ ## πŸ“š Training Data
47
+
48
+ ### [ChartVerse-SFT-600K](https://huggingface.co/datasets/opendatalab/ChartVerse-SFT-600K)
49
+ - **412K** unique high-complexity charts
50
+ - **603K** QA pairs with **3.9B** tokens of CoT reasoning
51
+ - Rollout Posterior Entropy: **0.44** (highest among all datasets)
52
+ - Truth-anchored answer verification via code execution
53
+
54
+ ### [ChartVerse-RL-40K](https://huggingface.co/datasets/opendatalab/ChartVerse-RL-40K)
55
+ - **40K** highest-difficulty samples
56
+ - Filtered by failure rate: 0 < r(Q) < 1
57
+ - Ensures "hard but solvable" training signal
58
+
59
+ ## πŸ‹οΈ Training Details
60
+
61
+ **Supervised Fine-Tuning (SFT)**:
62
+ - Framework: LLaMA-Factory
63
+ - Dataset: ChartVerse-SFT-600K
64
+ - Learning rate: 1.0 Γ— 10⁻⁡
65
+ - Global batch size: 128
66
+ - Context length: 22,000 tokens
67
+
68
+ **Reinforcement Learning (RL)**:
69
+ - Framework: veRL
70
+ - Dataset: ChartVerse-RL-40K
71
+ - Algorithm: GSPO
72
+ - Learning rate: 1.0 Γ— 10⁻⁢
73
+ - Rollout samples: 16 per prompt
74
+
75
+ ## πŸš€ Quick Start
76
+
77
+ ```python
78
+ from transformers import Qwen3VLForConditionalGeneration, AutoProcessor
79
+ from qwen_vl_utils import process_vision_info
80
+ from PIL import Image
81
+
82
+ # 1. Load Model
83
+ model_path = "opendatalab/ChartVerse-4B"
84
+ model = Qwen3VLForConditionalGeneration.from_pretrained(
85
+ model_path, torch_dtype="auto", device_map="auto"
86
+ )
87
+ processor = AutoProcessor.from_pretrained(model_path)
88
+
89
+ # 2. Prepare Input
90
+ image_path = "path/to/your/chart.png"
91
+ query = "Which region demonstrates the greatest proportional variation in annual revenue compared to its typical revenue level?"
92
+
93
+ messages = [
94
+ {
95
+ "role": "user",
96
+ "content": [
97
+ {"type": "image", "image": image_path},
98
+ {"type": "text", "text": query},
99
+ ],
100
+ }
101
+ ]
102
+
103
+ # 3. Inference
104
+ text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
105
+ image_inputs, video_inputs = process_vision_info(messages)
106
+ inputs = processor(
107
+ text=[text],
108
+ images=image_inputs,
109
+ padding=True,
110
+ return_tensors="pt",
111
+ ).to("cuda")
112
+ generated_ids = model.generate(**inputs, max_new_tokens=16384)
113
+ output_text = processor.batch_decode(
114
+ generated_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False
115
+ )
116
+ print(output_text[0])
117
+ ```
118
+
119
+ ## πŸ“– Citation
120
+
121
+ ```bibtex
122
+ @article{chartverse2026,
123
+ title={ChartVerse: Scaling Chart Reasoning via Reliable Programmatic Synthesis from Scratch},
124
+ author={Anonymous Authors},
125
+ journal={Anonymous ACL Submission},
126
+ year={2026}
127
+ }
128
+ ```
129
+
130
+ ## πŸ“„ License
131
+
132
+ This model is released under the Apache 2.0 License.
133
+
134
+ ## πŸ™ Acknowledgements
135
+
136
+ - Base model: [Qwen3-VL-4B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-4B-Instruct)
137
+ - Training frameworks: [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory), [veRL](https://github.com/volcengine/verl)
138
+ - Evaluation: [VLMEvalKit](https://github.com/open-compass/VLMEvalKit)