nielsr HF Staff commited on
Commit
51d74ea
·
verified ·
1 Parent(s): 99f2f73

Update model card metadata and add links to paper/code

Browse files

This PR improves the model card for **Innovator-VL-8B-Thinking** by:
1. Updating the `pipeline_tag` to `image-text-to-text` for better categorization.
2. Adding `library_name: transformers` to enable automated code snippets on the Hub.
3. Adding explicit links to the paper, project page, and GitHub repository.
4. Ensuring the model card provides a clear overview of the model's architecture and training stages as described in the technical report.

Files changed (1) hide show
  1. README.md +26 -18
README.md CHANGED
@@ -1,13 +1,16 @@
1
  ---
2
- license: mit
3
  language:
4
  - en
5
  - zh
6
- pipeline_tag: text-generation
 
 
7
  ---
8
 
9
  # Innovator-VL-8B-Thinking
10
 
 
 
11
  ## Introduction
12
 
13
  **Innovator-VL-8B-Thinking** is a multimodal reasoning-oriented large
@@ -41,16 +44,18 @@ for reasoning-intensive multimodal scenarios.
41
  ### Explicit Multimodal Reasoning
42
 
43
  Innovator-VL-8B-Thinking is trained to explicitly generate structured
44
- reasoning traces, enabling the model to: - Perform multi-step logical
45
- deduction grounded in visual evidence - Solve complex mathematical and
46
- scientific problems - Maintain reasoning consistency across long
47
- contexts
48
 
49
  ### Reinforcement Learning for Long-Horizon Reasoning
50
 
51
  The model is further optimized using reinforcement learning to
52
- improve: - Reasoning correctness - Output consistency - Token efficiency
53
- in long chain-of-thought generation
 
 
54
 
55
  Sequence-level optimization enables strong accuracy while significantly
56
  reducing unnecessary reasoning tokens.
@@ -58,15 +63,16 @@ reducing unnecessary reasoning tokens.
58
  ### Scientific Reasoning Performance
59
 
60
  Compared to instruction-only models, Innovator-VL-8B-Thinking
61
- demonstrates substantial gains on: - Multimodal mathematical reasoning
62
- benchmarks - Scientific reasoning and domain-specific QA - Tasks
63
- requiring precise step-by-step analysis
 
64
 
65
  ------------------------------------------------------------------------
66
 
67
  ## Model Architecture
68
 
69
- <img src="assets/innovator_vl_architecture.png" width="600"/>
70
 
71
  - **Vision Encoder**: RICE-ViT (region-aware visual representation)
72
  - **Projector**: PatchMerger for visual token compression
@@ -103,12 +109,12 @@ stage.
103
 
104
  ## Usage Recommendations
105
 
106
- This model is recommended for: - Multimodal mathematical reasoning -
107
- Scientific problem solving requiring explicit reasoning - Evaluation
108
- settings emphasizing chain-of-thought quality
 
109
 
110
- For general instruction-following or latency-sensitive applications, the
111
- Instruct version is recommended.
112
 
113
  ------------------------------------------------------------------------
114
 
@@ -154,7 +160,9 @@ messages = [
154
  "type": "image",
155
  "image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg",
156
  },
157
- {"type": "text", "text": f"{THINKING_PROMPT}\n\n{question}"},
 
 
158
  ],
159
  }
160
  ]
 
1
  ---
 
2
  language:
3
  - en
4
  - zh
5
+ license: mit
6
+ pipeline_tag: image-text-to-text
7
+ library_name: transformers
8
  ---
9
 
10
  # Innovator-VL-8B-Thinking
11
 
12
+ [[Paper](https://huggingface.co/papers/2601.19325)] [[Project Page](https://innovatorlm.github.io/Innovator-VL)] [[GitHub](https://github.com/InnovatorLM/Innovator-VL)] [[Demo](https://huggingface.co/spaces/InnovatorLab/Innovator-VL)]
13
+
14
  ## Introduction
15
 
16
  **Innovator-VL-8B-Thinking** is a multimodal reasoning-oriented large
 
44
  ### Explicit Multimodal Reasoning
45
 
46
  Innovator-VL-8B-Thinking is trained to explicitly generate structured
47
+ reasoning traces, enabling the model to:
48
+ - Perform multi-step logical deduction grounded in visual evidence
49
+ - Solve complex mathematical and scientific problems
50
+ - Maintain reasoning consistency across long contexts
51
 
52
  ### Reinforcement Learning for Long-Horizon Reasoning
53
 
54
  The model is further optimized using reinforcement learning to
55
+ improve:
56
+ - Reasoning correctness
57
+ - Output consistency
58
+ - Token efficiency in long chain-of-thought generation
59
 
60
  Sequence-level optimization enables strong accuracy while significantly
61
  reducing unnecessary reasoning tokens.
 
63
  ### Scientific Reasoning Performance
64
 
65
  Compared to instruction-only models, Innovator-VL-8B-Thinking
66
+ demonstrates substantial gains on:
67
+ - Multimodal mathematical reasoning benchmarks
68
+ - Scientific reasoning and domain-specific QA
69
+ - Tasks requiring precise step-by-step analysis
70
 
71
  ------------------------------------------------------------------------
72
 
73
  ## Model Architecture
74
 
75
+ <img src="https://huggingface.co/InnovatorLab/Innovator-VL-8B-Thinking/resolve/main/assets/innovator_vl_architecture.png" width="600"/>
76
 
77
  - **Vision Encoder**: RICE-ViT (region-aware visual representation)
78
  - **Projector**: PatchMerger for visual token compression
 
109
 
110
  ## Usage Recommendations
111
 
112
+ This model is recommended for:
113
+ - Multimodal mathematical reasoning
114
+ - Scientific problem solving requiring explicit reasoning
115
+ - Evaluation settings emphasizing chain-of-thought quality
116
 
117
+ For general instruction-following or latency-sensitive applications, the Instruct version is recommended.
 
118
 
119
  ------------------------------------------------------------------------
120
 
 
160
  "type": "image",
161
  "image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg",
162
  },
163
+ {"type": "text", "text": f"{THINKING_PROMPT}
164
+
165
+ {question}"},
166
  ],
167
  }
168
  ]