nielsr HF Staff commited on
Commit
dca8477
ยท
verified ยท
1 Parent(s): 0c059f4

Add metadata and link to research paper

Browse files

Hi! This PR improves the model card for Yuan3.0 Flash by adding relevant metadata:
- **`pipeline_tag: image-text-to-text`**: Correctly categorizes the model's multimodal capabilities.
- **`library_name: transformers`**: Enabled because the repository follows standard configuration for Hugging Face Transformers.
- **`license: other`**: Based on the custom Yuan 3.0 Model License Agreement mentioned in the README.
- **Paper Link**: Connected the model to its research paper [Yuan3.0 Flash: An Open Multimodal Large Language Model for Enterprise Applications](https://huggingface.co/papers/2601.01718).

These changes will improve the model's discoverability and utility on the Hugging Face Hub.

Files changed (1) hide show
  1. README.md +30 -67
README.md CHANGED
@@ -1,3 +1,9 @@
 
 
 
 
 
 
1
  <div align="center">
2
  <h1>
3
  Yuan 3.0 Multimodal Foundation Model
@@ -15,25 +21,19 @@
15
  <a href="https://arxiv.org/abs/2601.01718"><img alt="arXiv"
16
  src="https://img.shields.io/badge/arXiv-Yuan3.0%20Paper-b31b1b?logo=arxiv&logoColor=white"/></a>
17
  </a>
18
-
19
-
20
-
21
  </div>
22
 
23
-
24
  -----
25
 
26
-
27
 
28
  ## Latest Updates ๐ŸŽ‰๐ŸŽ‰
29
 
30
  * **[2025-12-30]** **Released Yuan 3.0-40B Multimodal Large Language Model, a high-performance model for enterprise-grade application scenarios: Yuan3.0 Flash**
31
 
32
-
33
-
34
  ## 1. Introduction
35
 
36
- Yuan 3.0 Flash, developed by the **YuanLab.ai team**, is a **40B parameter multimodal foundation model** that employs a Mixture of Experts (MoE) architecture, activating only approximately **3.7B parameters** per inference. Through innovative reinforcement learning training methods (RAPO), it significantly reduces inference token consumption while improving reasoning accuracy, exploring the innovative path of "less computation, higher intelligence" for large language models. We have also released the <a href="https://arxiv.org/abs/2601.01718" target="_blank">**technical report**</a> for the Yuan3.0 model, where you can find more detailed technical information and evaluation results.
37
 
38
  <div align="center">
39
  <img src="https://huggingface.co/YuanLabAI/Yuan3.0-Flash-4bit/resolve/main/docs/Yuan3.0-architecture.png" width="80%" />
@@ -55,11 +55,9 @@ Yuan 3.0 Flash outperforms GPT-5.1 in enterprise-grade RAG, multimodal retrieval
55
 
56
  <div align="center">
57
  <img src="https://huggingface.co/YuanLabAI/Yuan3.0-Flash-4bit/resolve/main/docs/Yuan3.0-benchmarks.png" width="80%" />
58
- Fig.1: Yuan3.0 Multimodal Large Language Model Architecture
59
  </div>
60
 
61
-
62
-
63
  ## 3. Core Technology
64
 
65
  ### RAPO Reinforcement Learning Algorithm
@@ -76,10 +74,6 @@ The innovative **Reflection-aware Adaptive Policy Optimization (RAPO)** algorith
76
  | RL+DAPO length-penalty | 46.35% | 13,781 tokens | 89.06% | 3,974 tokens |
77
  | **RL+RIRM** | **47.92%** | **7,505 tokens** | **89.47%** | **1,777 tokens** |
78
 
79
-
80
-
81
-
82
-
83
  ## 4. Model Download
84
 
85
  **We provide download links for multiple model formats:**
@@ -89,10 +83,6 @@ The innovative **Reflection-aware Adaptive Policy Optimization (RAPO)** algorith
89
  | Yuan3.0 Flash | 40B | 16bit | 128K | HuggingFace | [ModelScope]( https://modelscope.cn/models/Yuanlab/Yuan3.0-Flash) \| [HuggingFace]( https://huggingface.co/YuanLabAI/Yuan3.0-Flash) \| [WiseModel]( https://www.wisemodel.cn/models/YuanLabAI/Yuan3.0-Flash)
90
  | Yuan3.0 Flash 4bit | 40B | 4bit | 128K | HuggingFace | [ModelScope]( https://modelscope.cn/models/Yuanlab/Yuan3.0-Flash-int4) \| [HuggingFace]( https://huggingface.co/YuanLabAI/Yuan3.0-Flash-4bit) \| [WiseModel]( https://www.wisemodel.cn/models/YuanLab/Yuan3.0-Flash-4bit)
91
 
92
-
93
-
94
-
95
-
96
  ## 5. Evaluation Results
97
 
98
  **5.1 Text-based RAG Evaluation: ChatRAG** ๐Ÿ†
@@ -101,80 +91,53 @@ Yuan 3.0 Flash leads DeepSeek-V3, DeepSeek-R1 and other large language models in
101
 
102
  **Model Average Accuracy Comparison**
103
 
104
-
105
  | Models | Avg All | D2D | QuAC | QReCC | CoQA | DoQA | CFQA | SQA | TCQA | HDial | INSCIT |
106
  |--------|---------|-----|------|-------|------|------|------|-----|------|-------|--------|
107
  | **DeepSeek-V3** | 50.47 | 31.59 | 28.86 | 49.31 | 76.98 | 26.11 | 83.49 | 82.13 | 46.69 | 47.43 | 32.08 |
108
- | **DeepSeek-V3.23** | 49.67 | 34.30 | 28.09 | 49.97 | 77.29 | 29.46 | 72.85 | 79.48 | 44.64 | 47.99 | 32.64 |
109
  | **OpenAI GPT-4o** | 50.54 | 32.76 | 26.56 | 49.30 | 76.11 | 28.78 | 81.85 | 81.14 | 49.75 | 41.29 | 26.69 |
110
- | **OpenAI GPT-o3** | 44.06 | 23.05 | 20.82 | 40.42 | 69.42 | 18.56 | 67.75 | 86.71 | 45.85 | 41.29 | 26.69 |
111
- | **DeepSeek-R1** | 39.42 | 21.46 | 22.23 | 42.41 | 62.53 | 24.68 | 81.48 | 82.06 | 30.74 | 37.97 | 28.68 |
112
- | **OpenAI GPT-5.1** | 46.10 | 28.24 | 23.16 | 45.43 | 68.84 | 20.88 | 73.05 | 81.32 | 44.70 | 45.39 | 29.95 |
113
  | **Yuan3.0 Flash** | **64.47** | 49.82 | 53.79 | 57.08 | 90.93 | 59.99 | 74.40 | 87.52 | 66.31 | 68.45 | 36.40 |
114
 
115
-
116
-
117
- *<small>
118
- โ€ข **Long Context Tests** (D2D, QuAC, QReCC)
119
- โ€ข **Wikipedia Retrieval Tests** (TCQA, INSCIT)
120
- โ€ข **Short Text & Structured Context Tests** (CoQA, DoQA, CFQA, SQA, HDial)
121
- </small>*
122
-
123
  ---
124
 
125
-
126
  **5.2 Multimodal RAG Evaluation: Docmatix** ๐Ÿ†
127
 
128
- Yuan3.0 Flash leads Claude3.5, OpenAI GPT-4o, o3 and other models in the multimodal RAG benchmark Docmatix, with accuracy performance only second to GPT-5.1.
129
-
130
- **Model Average Accuracy Comparison**
131
-
132
  | Models | Avg. |
133
  |--------|:---------:|
134
  | **Qwen2.5-VL-72B-Instruct** | 59.75 |
135
- | **InternVL3-78B** | 42.99 |
136
- | **Claude3.5-Sonnet** | 42.55 |
137
- | **OpenAI GPT-4o** | 56.79 |
138
- | **OpenAI GPT-o3** | 45.57 |
139
  | **OpenAI GPT-4V** | 60.10 |
140
- | **OpenAI GPT-5.1** | 48.52 |
141
  | **Yuan3.0 Flash** | **65.07** |
142
 
143
-
144
- *<small>**Docmatix** - Evaluates the model's ability to retrieve information, correlate, and accurately answer questions across text, tables, images and other multimodal content in multi-page complex documents.</small>*
145
-
146
  ---
147
 
148
  **5.3 Multimodal Complex Table Content Analysis Evaluation: MMTab** ๐Ÿ†
149
 
150
- Multimodal table understanding is an important application scenario in enterprise office automation. Yuan3.0 Flash achieves leading average accuracy on 15 evaluation tasks in the industry-standard multimodal complex table understanding benchmark MMTab, surpassing OpenAI's GPT-5.1.
151
-
152
- **Model Average Accuracy Comparison**
153
-
154
- | Models | Avg. | TABMWP | WTQ | WTQ | HiTab | TAT-QA | FeTaQAU | TabFact | InfoTabs | HiTab_T2T | Rotowire | WikiBIO | TSD_Row | TSD_Col | TCE | TCL | MCD | RCE |
155
- |--------|:----:|:------:|:---:|:---:|:-----:|:------:|:-------:|:-------:|:--------:|:---------:|:--------:|:-------:|:-------:|:-------:|:---:|:---:|:---:|:---:|
156
- | **Zhipu GLM-4.5V** | 52.00 | 88.21 | 77.42 | 51.52 | 62.69 | 5.25 | 89.44 | 79.48 | 5.17 | 4.48 | 2.69 | 47.40 | 89.70 | 52.74 | 50.84 | 43.47 | 50.77 | 82.79 |
157
- | **OpenAI GPT-4V** | 29.90 | 60.50 | 48.00 | 27.50 | 32.50 | 11.04 | 45.50 | 65.60 | 2.98 | 4.23 | 1.94 | 19.00 | 38.00 | 14.36 | 27.91 | 3.50 | 48.52 | 57.14 |
158
- | **OpenAI GPT-5.1** | 55.15 | 64.95 | 60.77 | 77.77 | 61.37 | 8.70 | 52.81 | 64.30 | 44.16 | 17.81 | 11.95 | 96.60 | 62.10 | 86.43 | 44.66 | 72.46 | 53.58 | 57.20 |
159
- | **Yuan3.0 Flash** | 58.29 | 95.09 | 68.23 | 69.80 | 69.17 | 28.42 | 87.32 | 83.50 | 13.30 | 14.74 | 17.26 | 46.60 | 82.80 | 56.77 | 56.98 | 65.20 | 62.07 | 73.67 |
160
 
161
  ---
162
 
163
  **5.4 Text Summarization Generation Evaluation: SummEval** ๐Ÿ†
164
 
165
- Summarization generation is a core requirement for historical information compression in intelligent agent applications. Yuan 3.0 achieves leading average accuracy in the industry-standard summarization generation benchmark SummEval across three major capabilities: lexical overlap, semantic similarity, and factual consistency, surpassing the DeepSeek-V3 large language model.
166
-
167
- **Model Average Accuracy Comparison**
 
168
 
 
169
 
170
- | Models | Avg. | Lexical Overlap<br>ROUGE-1 | Lexical Overlap<br>ROUGE-2 | Semantic Similarity<br>BERTScore | Factual Consistency<br>SummaC |
171
- |--------|:---------:|:-----------:|:-----------:|:--------------:|:------------:|
172
- | **DeepSeek-V3** | 59.28 | 25.50 | 9.20 | 86.30 | 68.20 |
173
- | **DeepSeek-V3.2** | 51.36 | 33.30 | 11.92 | 85.61 | 41.76 |
174
- | **Gemini-2.0-Flash** | 45.35 | 24.80 | 8.70 | 85.70 | 29.50 |
175
- | **Claude-3.5-Sonnet** | 45.43 | 24.10 | 8.30 | 85.20 | 30.70 |
176
- | **OpenAI GPT-4o** | 46.53 | 25.00 | 8.90 | 85.90 | 32.50 |
177
- | **OpenAI GPT-5.1** | 49.44 | 27.48 | 10.16 | 84.63 | 40.50 |
178
- | **Yuan3.0 Flash** | **59.31** | 51.32 | 28.32 | 89.99 | 45.34 |
179
 
 
 
180
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ library_name: transformers
4
+ pipeline_tag: image-text-to-text
5
+ ---
6
+
7
  <div align="center">
8
  <h1>
9
  Yuan 3.0 Multimodal Foundation Model
 
21
  <a href="https://arxiv.org/abs/2601.01718"><img alt="arXiv"
22
  src="https://img.shields.io/badge/arXiv-Yuan3.0%20Paper-b31b1b?logo=arxiv&logoColor=white"/></a>
23
  </a>
 
 
 
24
  </div>
25
 
 
26
  -----
27
 
28
+ This repository contains **Yuan 3.0 Flash**, a Mixture-of-Experts (MoE) Multimodal Large Language Model featuring 3.7B activated parameters and 40B total parameters, specifically designed to enhance performance on enterprise-oriented tasks. It was introduced in the paper [Yuan3.0 Flash: An Open Multimodal Large Language Model for Enterprise Applications](https://huggingface.co/papers/2601.01718).
29
 
30
  ## Latest Updates ๐ŸŽ‰๐ŸŽ‰
31
 
32
  * **[2025-12-30]** **Released Yuan 3.0-40B Multimodal Large Language Model, a high-performance model for enterprise-grade application scenarios: Yuan3.0 Flash**
33
 
 
 
34
  ## 1. Introduction
35
 
36
+ Yuan 3.0 Flash, developed by the **YuanLab.ai team**, is a **40B parameter multimodal foundation model** that employs a Mixture of Experts (MoE) architecture, activating only approximately **3.7B parameters** per inference. Through innovative reinforcement learning training methods (RAPO), it significantly reduces inference token consumption while improving reasoning accuracy, exploring the innovative path of "less computation, higher intelligence" for large language models.
37
 
38
  <div align="center">
39
  <img src="https://huggingface.co/YuanLabAI/Yuan3.0-Flash-4bit/resolve/main/docs/Yuan3.0-architecture.png" width="80%" />
 
55
 
56
  <div align="center">
57
  <img src="https://huggingface.co/YuanLabAI/Yuan3.0-Flash-4bit/resolve/main/docs/Yuan3.0-benchmarks.png" width="80%" />
58
+ Fig.2: Yuan3.0 Flash Evaluation Results
59
  </div>
60
 
 
 
61
  ## 3. Core Technology
62
 
63
  ### RAPO Reinforcement Learning Algorithm
 
74
  | RL+DAPO length-penalty | 46.35% | 13,781 tokens | 89.06% | 3,974 tokens |
75
  | **RL+RIRM** | **47.92%** | **7,505 tokens** | **89.47%** | **1,777 tokens** |
76
 
 
 
 
 
77
  ## 4. Model Download
78
 
79
  **We provide download links for multiple model formats:**
 
83
  | Yuan3.0 Flash | 40B | 16bit | 128K | HuggingFace | [ModelScope]( https://modelscope.cn/models/Yuanlab/Yuan3.0-Flash) \| [HuggingFace]( https://huggingface.co/YuanLabAI/Yuan3.0-Flash) \| [WiseModel]( https://www.wisemodel.cn/models/YuanLabAI/Yuan3.0-Flash)
84
  | Yuan3.0 Flash 4bit | 40B | 4bit | 128K | HuggingFace | [ModelScope]( https://modelscope.cn/models/Yuanlab/Yuan3.0-Flash-int4) \| [HuggingFace]( https://huggingface.co/YuanLabAI/Yuan3.0-Flash-4bit) \| [WiseModel]( https://www.wisemodel.cn/models/YuanLab/Yuan3.0-Flash-4bit)
85
 
 
 
 
 
86
  ## 5. Evaluation Results
87
 
88
  **5.1 Text-based RAG Evaluation: ChatRAG** ๐Ÿ†
 
91
 
92
  **Model Average Accuracy Comparison**
93
 
 
94
  | Models | Avg All | D2D | QuAC | QReCC | CoQA | DoQA | CFQA | SQA | TCQA | HDial | INSCIT |
95
  |--------|---------|-----|------|-------|------|------|------|-----|------|-------|--------|
96
  | **DeepSeek-V3** | 50.47 | 31.59 | 28.86 | 49.31 | 76.98 | 26.11 | 83.49 | 82.13 | 46.69 | 47.43 | 32.08 |
 
97
  | **OpenAI GPT-4o** | 50.54 | 32.76 | 26.56 | 49.30 | 76.11 | 28.78 | 81.85 | 81.14 | 49.75 | 41.29 | 26.69 |
 
 
 
98
  | **Yuan3.0 Flash** | **64.47** | 49.82 | 53.79 | 57.08 | 90.93 | 59.99 | 74.40 | 87.52 | 66.31 | 68.45 | 36.40 |
99
 
 
 
 
 
 
 
 
 
100
  ---
101
 
 
102
  **5.2 Multimodal RAG Evaluation: Docmatix** ๐Ÿ†
103
 
 
 
 
 
104
  | Models | Avg. |
105
  |--------|:---------:|
106
  | **Qwen2.5-VL-72B-Instruct** | 59.75 |
 
 
 
 
107
  | **OpenAI GPT-4V** | 60.10 |
 
108
  | **Yuan3.0 Flash** | **65.07** |
109
 
 
 
 
110
  ---
111
 
112
  **5.3 Multimodal Complex Table Content Analysis Evaluation: MMTab** ๐Ÿ†
113
 
114
+ | Models | Avg. | TABMWP | WTQ | WTQ | HiTab |
115
+ |--------|:----:|:------:|:---:|:---:|:-----:|
116
+ | **OpenAI GPT-5.1** | 55.15 | 64.95 | 60.77 | 77.77 | 61.37 |
117
+ | **Yuan3.0 Flash** | 58.29 | 95.09 | 68.23 | 69.80 | 69.17 |
 
 
 
 
 
 
118
 
119
  ---
120
 
121
  **5.4 Text Summarization Generation Evaluation: SummEval** ๐Ÿ†
122
 
123
+ | Models | Avg. | Lexical Overlap ROUGE-1 | Semantic Similarity BERTScore | Factual Consistency SummaC |
124
+ |--------|:---------:|:-----------:|:--------------:|:------------:|
125
+ | **DeepSeek-V3** | 59.28 | 25.50 | 86.30 | 68.20 |
126
+ | **Yuan3.0 Flash** | **59.31** | 51.32 | 89.99 | 45.34 |
127
 
128
+ ## 6. Quick Start
129
 
130
+ For specific usage methods, please refer to the official [QuickStart](https://github.com/Yuan-lab-LLM/Yuan3.0/blob/main/vllm/README_Yuan.md) guide.
 
 
 
 
 
 
 
 
131
 
132
+ ## 7. License Agreement
133
+ The use of Yuan 3.0 code and models must comply with the [ใ€ŠYuan 3.0 Model License Agreementใ€‹](https://github.com/Yuan-lab-LLM/Yuan3.0?tab=License-1-ov-file). The Yuan 3.0 model supports commercial use without requiring authorization application.
134
 
135
+ ## 8. Citation
136
+ ```bibtex
137
+ @article{yuan3flash2025,
138
+ title={Yuan3.0 Flash: An Open Multimodal Large Language Model for Enterprise Applications},
139
+ author={YuanLab.ai and others},
140
+ journal={arXiv preprint arXiv:2601.01718},
141
+ year={2025}
142
+ }
143
+ ```