kawhiiiileo commited on
Commit
a222f68
·
verified ·
1 Parent(s): 0c928bd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +118 -3
README.md CHANGED
@@ -1,3 +1,118 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ - zh
6
+ pipeline_tag: text-generation
7
+ ---
8
+
9
+ # Innovator-VL-8B-Thinking
10
+
11
+ ## Introduction
12
+
13
+ **Innovator-VL-8B-Thinking** is a multimodal reasoning-oriented large
14
+ language model designed for complex scientific problem solving. Built
15
+ upon Innovator-VL-8B-Instruct, this model is further optimized for
16
+ explicit multi-step reasoning, long-horizon chain-of-thought generation,
17
+ and token-efficient scientific analysis.
18
+
19
+ The model is particularly suitable for scientific tasks that require
20
+ structured reasoning over visual and textual evidence, such as
21
+ mathematics, chemistry, materials science, and multimodal scientific
22
+ benchmarks.
23
+
24
+ ------------------------------------------------------------------------
25
+
26
+ ## Model Overview
27
+
28
+ - **Model Type**: Vision-Language Reasoning Model
29
+ - **Parameter Size**: 8B
30
+ - **Base Language Model**: Qwen3-8B-Base
31
+ - **Vision Encoder**: RICE-ViT
32
+ - **Projector**: PatchMerger
33
+
34
+ The model supports native-resolution multi-image inputs and is optimized
35
+ for reasoning-intensive multimodal scenarios.
36
+
37
+ ------------------------------------------------------------------------
38
+
39
+ ## Key Characteristics
40
+
41
+ ### Explicit Multimodal Reasoning
42
+
43
+ Innovator-VL-8B-Thinking is trained to explicitly generate structured
44
+ reasoning traces, enabling the model to: - Perform multi-step logical
45
+ deduction grounded in visual evidence - Solve complex mathematical and
46
+ scientific problems - Maintain reasoning consistency across long
47
+ contexts
48
+
49
+ ### Reinforcement Learning for Long-Horizon Reasoning
50
+
51
+ The model is further optimized using reinforcement learning to
52
+ improve: - Reasoning correctness - Output consistency - Token efficiency
53
+ in long chain-of-thought generation
54
+
55
+ Sequence-level optimization enables strong accuracy while significantly
56
+ reducing unnecessary reasoning tokens.
57
+
58
+ ### Scientific Reasoning Performance
59
+
60
+ Compared to instruction-only models, Innovator-VL-8B-Thinking
61
+ demonstrates substantial gains on: - Multimodal mathematical reasoning
62
+ benchmarks - Scientific reasoning and domain-specific QA - Tasks
63
+ requiring precise step-by-step analysis
64
+
65
+ ------------------------------------------------------------------------
66
+
67
+ ## Model Architecture
68
+
69
+ `<img src="assets/innovator_vl_architecture.png" width="600"/>`{=html}
70
+
71
+ - **Vision Encoder**: RICE-ViT (region-aware visual representation)
72
+ - **Projector**: PatchMerger for visual token compression
73
+ - **Language Model**: Qwen3-8B-Base
74
+ - **Model Size**: 8B parameters
75
+
76
+ The architecture is shared with the Instruct variant, while the
77
+ optimization objective and training strategy differ at the post-training
78
+ stage.
79
+
80
+ ------------------------------------------------------------------------
81
+
82
+ ## Training Pipeline
83
+
84
+ ### Multimodal Pre-training
85
+
86
+ - Vision-language alignment with LLaVA-1.5 (558K)
87
+ - Full-parameter mid-training using LLaVA-OneVision-1.5 (85M)
88
+
89
+ ### Instruction Initialization
90
+
91
+ - Initialized from Innovator-VL-8B-Instruct
92
+ - Supervised fine-tuning with multimodal instruction and reasoning
93
+ data
94
+
95
+ ### Reinforcement Learning
96
+
97
+ - Trained with Innovator-VL-RL-172K
98
+ - Optimized using Group Sequence Policy Optimization (GSPO)
99
+ - Reward design jointly considers reasoning structure and answer
100
+ correctness
101
+
102
+ ------------------------------------------------------------------------
103
+
104
+ ## Usage Recommendations
105
+
106
+ This model is recommended for: - Multimodal mathematical reasoning -
107
+ Scientific problem solving requiring explicit reasoning - Evaluation
108
+ settings emphasizing chain-of-thought quality
109
+
110
+ For general instruction-following or latency-sensitive applications, the
111
+ Instruct version is recommended.
112
+
113
+ ------------------------------------------------------------------------
114
+
115
+ ## Citation
116
+
117
+ @article{innovator-vl, title={Innovator-VL: A Multimodal Large Language
118
+ Model for Scientific Discovery}, year={2025} }