Add metadata and link to research paper
Browse filesHi! This PR improves the model card for Yuan3.0 Flash by adding relevant metadata:
- **`pipeline_tag: image-text-to-text`**: Correctly categorizes the model's multimodal capabilities.
- **`library_name: transformers`**: Enabled because the repository follows standard configuration for Hugging Face Transformers.
- **`license: other`**: Based on the custom Yuan 3.0 Model License Agreement mentioned in the README.
- **Paper Link**: Connected the model to its research paper [Yuan3.0 Flash: An Open Multimodal Large Language Model for Enterprise Applications](https://huggingface.co/papers/2601.01718).
These changes will improve the model's discoverability and utility on the Hugging Face Hub.
README.md
CHANGED
|
@@ -1,3 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
<div align="center">
|
| 2 |
<h1>
|
| 3 |
Yuan 3.0 Multimodal Foundation Model
|
|
@@ -15,25 +21,19 @@
|
|
| 15 |
<a href="https://arxiv.org/abs/2601.01718"><img alt="arXiv"
|
| 16 |
src="https://img.shields.io/badge/arXiv-Yuan3.0%20Paper-b31b1b?logo=arxiv&logoColor=white"/></a>
|
| 17 |
</a>
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
|
| 21 |
</div>
|
| 22 |
|
| 23 |
-
|
| 24 |
-----
|
| 25 |
|
| 26 |
-
|
| 27 |
|
| 28 |
## Latest Updates ๐๐
|
| 29 |
|
| 30 |
* **[2025-12-30]** **Released Yuan 3.0-40B Multimodal Large Language Model, a high-performance model for enterprise-grade application scenarios: Yuan3.0 Flash**
|
| 31 |
|
| 32 |
-
|
| 33 |
-
|
| 34 |
## 1. Introduction
|
| 35 |
|
| 36 |
-
Yuan 3.0 Flash, developed by the **YuanLab.ai team**, is a **40B parameter multimodal foundation model** that employs a Mixture of Experts (MoE) architecture, activating only approximately **3.7B parameters** per inference. Through innovative reinforcement learning training methods (RAPO), it significantly reduces inference token consumption while improving reasoning accuracy, exploring the innovative path of "less computation, higher intelligence" for large language models.
|
| 37 |
|
| 38 |
<div align="center">
|
| 39 |
<img src="https://huggingface.co/YuanLabAI/Yuan3.0-Flash-4bit/resolve/main/docs/Yuan3.0-architecture.png" width="80%" />
|
|
@@ -55,11 +55,9 @@ Yuan 3.0 Flash outperforms GPT-5.1 in enterprise-grade RAG, multimodal retrieval
|
|
| 55 |
|
| 56 |
<div align="center">
|
| 57 |
<img src="https://huggingface.co/YuanLabAI/Yuan3.0-Flash-4bit/resolve/main/docs/Yuan3.0-benchmarks.png" width="80%" />
|
| 58 |
-
Fig.
|
| 59 |
</div>
|
| 60 |
|
| 61 |
-
|
| 62 |
-
|
| 63 |
## 3. Core Technology
|
| 64 |
|
| 65 |
### RAPO Reinforcement Learning Algorithm
|
|
@@ -76,10 +74,6 @@ The innovative **Reflection-aware Adaptive Policy Optimization (RAPO)** algorith
|
|
| 76 |
| RL+DAPO length-penalty | 46.35% | 13,781 tokens | 89.06% | 3,974 tokens |
|
| 77 |
| **RL+RIRM** | **47.92%** | **7,505 tokens** | **89.47%** | **1,777 tokens** |
|
| 78 |
|
| 79 |
-
|
| 80 |
-
|
| 81 |
-
|
| 82 |
-
|
| 83 |
## 4. Model Download
|
| 84 |
|
| 85 |
**We provide download links for multiple model formats:**
|
|
@@ -89,10 +83,6 @@ The innovative **Reflection-aware Adaptive Policy Optimization (RAPO)** algorith
|
|
| 89 |
| Yuan3.0 Flash | 40B | 16bit | 128K | HuggingFace | [ModelScope]( https://modelscope.cn/models/Yuanlab/Yuan3.0-Flash) \| [HuggingFace]( https://huggingface.co/YuanLabAI/Yuan3.0-Flash) \| [WiseModel]( https://www.wisemodel.cn/models/YuanLabAI/Yuan3.0-Flash)
|
| 90 |
| Yuan3.0 Flash 4bit | 40B | 4bit | 128K | HuggingFace | [ModelScope]( https://modelscope.cn/models/Yuanlab/Yuan3.0-Flash-int4) \| [HuggingFace]( https://huggingface.co/YuanLabAI/Yuan3.0-Flash-4bit) \| [WiseModel]( https://www.wisemodel.cn/models/YuanLab/Yuan3.0-Flash-4bit)
|
| 91 |
|
| 92 |
-
|
| 93 |
-
|
| 94 |
-
|
| 95 |
-
|
| 96 |
## 5. Evaluation Results
|
| 97 |
|
| 98 |
**5.1 Text-based RAG Evaluation: ChatRAG** ๐
|
|
@@ -101,80 +91,53 @@ Yuan 3.0 Flash leads DeepSeek-V3, DeepSeek-R1 and other large language models in
|
|
| 101 |
|
| 102 |
**Model Average Accuracy Comparison**
|
| 103 |
|
| 104 |
-
|
| 105 |
| Models | Avg All | D2D | QuAC | QReCC | CoQA | DoQA | CFQA | SQA | TCQA | HDial | INSCIT |
|
| 106 |
|--------|---------|-----|------|-------|------|------|------|-----|------|-------|--------|
|
| 107 |
| **DeepSeek-V3** | 50.47 | 31.59 | 28.86 | 49.31 | 76.98 | 26.11 | 83.49 | 82.13 | 46.69 | 47.43 | 32.08 |
|
| 108 |
-
| **DeepSeek-V3.23** | 49.67 | 34.30 | 28.09 | 49.97 | 77.29 | 29.46 | 72.85 | 79.48 | 44.64 | 47.99 | 32.64 |
|
| 109 |
| **OpenAI GPT-4o** | 50.54 | 32.76 | 26.56 | 49.30 | 76.11 | 28.78 | 81.85 | 81.14 | 49.75 | 41.29 | 26.69 |
|
| 110 |
-
| **OpenAI GPT-o3** | 44.06 | 23.05 | 20.82 | 40.42 | 69.42 | 18.56 | 67.75 | 86.71 | 45.85 | 41.29 | 26.69 |
|
| 111 |
-
| **DeepSeek-R1** | 39.42 | 21.46 | 22.23 | 42.41 | 62.53 | 24.68 | 81.48 | 82.06 | 30.74 | 37.97 | 28.68 |
|
| 112 |
-
| **OpenAI GPT-5.1** | 46.10 | 28.24 | 23.16 | 45.43 | 68.84 | 20.88 | 73.05 | 81.32 | 44.70 | 45.39 | 29.95 |
|
| 113 |
| **Yuan3.0 Flash** | **64.47** | 49.82 | 53.79 | 57.08 | 90.93 | 59.99 | 74.40 | 87.52 | 66.31 | 68.45 | 36.40 |
|
| 114 |
|
| 115 |
-
|
| 116 |
-
|
| 117 |
-
*<small>
|
| 118 |
-
โข **Long Context Tests** (D2D, QuAC, QReCC)
|
| 119 |
-
โข **Wikipedia Retrieval Tests** (TCQA, INSCIT)
|
| 120 |
-
โข **Short Text & Structured Context Tests** (CoQA, DoQA, CFQA, SQA, HDial)
|
| 121 |
-
</small>*
|
| 122 |
-
|
| 123 |
---
|
| 124 |
|
| 125 |
-
|
| 126 |
**5.2 Multimodal RAG Evaluation: Docmatix** ๐
|
| 127 |
|
| 128 |
-
Yuan3.0 Flash leads Claude3.5, OpenAI GPT-4o, o3 and other models in the multimodal RAG benchmark Docmatix, with accuracy performance only second to GPT-5.1.
|
| 129 |
-
|
| 130 |
-
**Model Average Accuracy Comparison**
|
| 131 |
-
|
| 132 |
| Models | Avg. |
|
| 133 |
|--------|:---------:|
|
| 134 |
| **Qwen2.5-VL-72B-Instruct** | 59.75 |
|
| 135 |
-
| **InternVL3-78B** | 42.99 |
|
| 136 |
-
| **Claude3.5-Sonnet** | 42.55 |
|
| 137 |
-
| **OpenAI GPT-4o** | 56.79 |
|
| 138 |
-
| **OpenAI GPT-o3** | 45.57 |
|
| 139 |
| **OpenAI GPT-4V** | 60.10 |
|
| 140 |
-
| **OpenAI GPT-5.1** | 48.52 |
|
| 141 |
| **Yuan3.0 Flash** | **65.07** |
|
| 142 |
|
| 143 |
-
|
| 144 |
-
*<small>**Docmatix** - Evaluates the model's ability to retrieve information, correlate, and accurately answer questions across text, tables, images and other multimodal content in multi-page complex documents.</small>*
|
| 145 |
-
|
| 146 |
---
|
| 147 |
|
| 148 |
**5.3 Multimodal Complex Table Content Analysis Evaluation: MMTab** ๐
|
| 149 |
|
| 150 |
-
|
| 151 |
-
|
| 152 |
-
**
|
| 153 |
-
|
| 154 |
-
| Models | Avg. | TABMWP | WTQ | WTQ | HiTab | TAT-QA | FeTaQAU | TabFact | InfoTabs | HiTab_T2T | Rotowire | WikiBIO | TSD_Row | TSD_Col | TCE | TCL | MCD | RCE |
|
| 155 |
-
|--------|:----:|:------:|:---:|:---:|:-----:|:------:|:-------:|:-------:|:--------:|:---------:|:--------:|:-------:|:-------:|:-------:|:---:|:---:|:---:|:---:|
|
| 156 |
-
| **Zhipu GLM-4.5V** | 52.00 | 88.21 | 77.42 | 51.52 | 62.69 | 5.25 | 89.44 | 79.48 | 5.17 | 4.48 | 2.69 | 47.40 | 89.70 | 52.74 | 50.84 | 43.47 | 50.77 | 82.79 |
|
| 157 |
-
| **OpenAI GPT-4V** | 29.90 | 60.50 | 48.00 | 27.50 | 32.50 | 11.04 | 45.50 | 65.60 | 2.98 | 4.23 | 1.94 | 19.00 | 38.00 | 14.36 | 27.91 | 3.50 | 48.52 | 57.14 |
|
| 158 |
-
| **OpenAI GPT-5.1** | 55.15 | 64.95 | 60.77 | 77.77 | 61.37 | 8.70 | 52.81 | 64.30 | 44.16 | 17.81 | 11.95 | 96.60 | 62.10 | 86.43 | 44.66 | 72.46 | 53.58 | 57.20 |
|
| 159 |
-
| **Yuan3.0 Flash** | 58.29 | 95.09 | 68.23 | 69.80 | 69.17 | 28.42 | 87.32 | 83.50 | 13.30 | 14.74 | 17.26 | 46.60 | 82.80 | 56.77 | 56.98 | 65.20 | 62.07 | 73.67 |
|
| 160 |
|
| 161 |
---
|
| 162 |
|
| 163 |
**5.4 Text Summarization Generation Evaluation: SummEval** ๐
|
| 164 |
|
| 165 |
-
|
| 166 |
-
|
| 167 |
-
**
|
|
|
|
| 168 |
|
|
|
|
| 169 |
|
| 170 |
-
|
| 171 |
-
|--------|:---------:|:-----------:|:-----------:|:--------------:|:------------:|
|
| 172 |
-
| **DeepSeek-V3** | 59.28 | 25.50 | 9.20 | 86.30 | 68.20 |
|
| 173 |
-
| **DeepSeek-V3.2** | 51.36 | 33.30 | 11.92 | 85.61 | 41.76 |
|
| 174 |
-
| **Gemini-2.0-Flash** | 45.35 | 24.80 | 8.70 | 85.70 | 29.50 |
|
| 175 |
-
| **Claude-3.5-Sonnet** | 45.43 | 24.10 | 8.30 | 85.20 | 30.70 |
|
| 176 |
-
| **OpenAI GPT-4o** | 46.53 | 25.00 | 8.90 | 85.90 | 32.50 |
|
| 177 |
-
| **OpenAI GPT-5.1** | 49.44 | 27.48 | 10.16 | 84.63 | 40.50 |
|
| 178 |
-
| **Yuan3.0 Flash** | **59.31** | 51.32 | 28.32 | 89.99 | 45.34 |
|
| 179 |
|
|
|
|
|
|
|
| 180 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: other
|
| 3 |
+
library_name: transformers
|
| 4 |
+
pipeline_tag: image-text-to-text
|
| 5 |
+
---
|
| 6 |
+
|
| 7 |
<div align="center">
|
| 8 |
<h1>
|
| 9 |
Yuan 3.0 Multimodal Foundation Model
|
|
|
|
| 21 |
<a href="https://arxiv.org/abs/2601.01718"><img alt="arXiv"
|
| 22 |
src="https://img.shields.io/badge/arXiv-Yuan3.0%20Paper-b31b1b?logo=arxiv&logoColor=white"/></a>
|
| 23 |
</a>
|
|
|
|
|
|
|
|
|
|
| 24 |
</div>
|
| 25 |
|
|
|
|
| 26 |
-----
|
| 27 |
|
| 28 |
+
This repository contains **Yuan 3.0 Flash**, a Mixture-of-Experts (MoE) Multimodal Large Language Model featuring 3.7B activated parameters and 40B total parameters, specifically designed to enhance performance on enterprise-oriented tasks. It was introduced in the paper [Yuan3.0 Flash: An Open Multimodal Large Language Model for Enterprise Applications](https://huggingface.co/papers/2601.01718).
|
| 29 |
|
| 30 |
## Latest Updates ๐๐
|
| 31 |
|
| 32 |
* **[2025-12-30]** **Released Yuan 3.0-40B Multimodal Large Language Model, a high-performance model for enterprise-grade application scenarios: Yuan3.0 Flash**
|
| 33 |
|
|
|
|
|
|
|
| 34 |
## 1. Introduction
|
| 35 |
|
| 36 |
+
Yuan 3.0 Flash, developed by the **YuanLab.ai team**, is a **40B parameter multimodal foundation model** that employs a Mixture of Experts (MoE) architecture, activating only approximately **3.7B parameters** per inference. Through innovative reinforcement learning training methods (RAPO), it significantly reduces inference token consumption while improving reasoning accuracy, exploring the innovative path of "less computation, higher intelligence" for large language models.
|
| 37 |
|
| 38 |
<div align="center">
|
| 39 |
<img src="https://huggingface.co/YuanLabAI/Yuan3.0-Flash-4bit/resolve/main/docs/Yuan3.0-architecture.png" width="80%" />
|
|
|
|
| 55 |
|
| 56 |
<div align="center">
|
| 57 |
<img src="https://huggingface.co/YuanLabAI/Yuan3.0-Flash-4bit/resolve/main/docs/Yuan3.0-benchmarks.png" width="80%" />
|
| 58 |
+
Fig.2: Yuan3.0 Flash Evaluation Results
|
| 59 |
</div>
|
| 60 |
|
|
|
|
|
|
|
| 61 |
## 3. Core Technology
|
| 62 |
|
| 63 |
### RAPO Reinforcement Learning Algorithm
|
|
|
|
| 74 |
| RL+DAPO length-penalty | 46.35% | 13,781 tokens | 89.06% | 3,974 tokens |
|
| 75 |
| **RL+RIRM** | **47.92%** | **7,505 tokens** | **89.47%** | **1,777 tokens** |
|
| 76 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 77 |
## 4. Model Download
|
| 78 |
|
| 79 |
**We provide download links for multiple model formats:**
|
|
|
|
| 83 |
| Yuan3.0 Flash | 40B | 16bit | 128K | HuggingFace | [ModelScope]( https://modelscope.cn/models/Yuanlab/Yuan3.0-Flash) \| [HuggingFace]( https://huggingface.co/YuanLabAI/Yuan3.0-Flash) \| [WiseModel]( https://www.wisemodel.cn/models/YuanLabAI/Yuan3.0-Flash)
|
| 84 |
| Yuan3.0 Flash 4bit | 40B | 4bit | 128K | HuggingFace | [ModelScope]( https://modelscope.cn/models/Yuanlab/Yuan3.0-Flash-int4) \| [HuggingFace]( https://huggingface.co/YuanLabAI/Yuan3.0-Flash-4bit) \| [WiseModel]( https://www.wisemodel.cn/models/YuanLab/Yuan3.0-Flash-4bit)
|
| 85 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 86 |
## 5. Evaluation Results
|
| 87 |
|
| 88 |
**5.1 Text-based RAG Evaluation: ChatRAG** ๐
|
|
|
|
| 91 |
|
| 92 |
**Model Average Accuracy Comparison**
|
| 93 |
|
|
|
|
| 94 |
| Models | Avg All | D2D | QuAC | QReCC | CoQA | DoQA | CFQA | SQA | TCQA | HDial | INSCIT |
|
| 95 |
|--------|---------|-----|------|-------|------|------|------|-----|------|-------|--------|
|
| 96 |
| **DeepSeek-V3** | 50.47 | 31.59 | 28.86 | 49.31 | 76.98 | 26.11 | 83.49 | 82.13 | 46.69 | 47.43 | 32.08 |
|
|
|
|
| 97 |
| **OpenAI GPT-4o** | 50.54 | 32.76 | 26.56 | 49.30 | 76.11 | 28.78 | 81.85 | 81.14 | 49.75 | 41.29 | 26.69 |
|
|
|
|
|
|
|
|
|
|
| 98 |
| **Yuan3.0 Flash** | **64.47** | 49.82 | 53.79 | 57.08 | 90.93 | 59.99 | 74.40 | 87.52 | 66.31 | 68.45 | 36.40 |
|
| 99 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 100 |
---
|
| 101 |
|
|
|
|
| 102 |
**5.2 Multimodal RAG Evaluation: Docmatix** ๐
|
| 103 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 104 |
| Models | Avg. |
|
| 105 |
|--------|:---------:|
|
| 106 |
| **Qwen2.5-VL-72B-Instruct** | 59.75 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 107 |
| **OpenAI GPT-4V** | 60.10 |
|
|
|
|
| 108 |
| **Yuan3.0 Flash** | **65.07** |
|
| 109 |
|
|
|
|
|
|
|
|
|
|
| 110 |
---
|
| 111 |
|
| 112 |
**5.3 Multimodal Complex Table Content Analysis Evaluation: MMTab** ๐
|
| 113 |
|
| 114 |
+
| Models | Avg. | TABMWP | WTQ | WTQ | HiTab |
|
| 115 |
+
|--------|:----:|:------:|:---:|:---:|:-----:|
|
| 116 |
+
| **OpenAI GPT-5.1** | 55.15 | 64.95 | 60.77 | 77.77 | 61.37 |
|
| 117 |
+
| **Yuan3.0 Flash** | 58.29 | 95.09 | 68.23 | 69.80 | 69.17 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 118 |
|
| 119 |
---
|
| 120 |
|
| 121 |
**5.4 Text Summarization Generation Evaluation: SummEval** ๐
|
| 122 |
|
| 123 |
+
| Models | Avg. | Lexical Overlap ROUGE-1 | Semantic Similarity BERTScore | Factual Consistency SummaC |
|
| 124 |
+
|--------|:---------:|:-----------:|:--------------:|:------------:|
|
| 125 |
+
| **DeepSeek-V3** | 59.28 | 25.50 | 86.30 | 68.20 |
|
| 126 |
+
| **Yuan3.0 Flash** | **59.31** | 51.32 | 89.99 | 45.34 |
|
| 127 |
|
| 128 |
+
## 6. Quick Start
|
| 129 |
|
| 130 |
+
For specific usage methods, please refer to the official [QuickStart](https://github.com/Yuan-lab-LLM/Yuan3.0/blob/main/vllm/README_Yuan.md) guide.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 131 |
|
| 132 |
+
## 7. License Agreement
|
| 133 |
+
The use of Yuan 3.0 code and models must comply with the [ใYuan 3.0 Model License Agreementใ](https://github.com/Yuan-lab-LLM/Yuan3.0?tab=License-1-ov-file). The Yuan 3.0 model supports commercial use without requiring authorization application.
|
| 134 |
|
| 135 |
+
## 8. Citation
|
| 136 |
+
```bibtex
|
| 137 |
+
@article{yuan3flash2025,
|
| 138 |
+
title={Yuan3.0 Flash: An Open Multimodal Large Language Model for Enterprise Applications},
|
| 139 |
+
author={YuanLab.ai and others},
|
| 140 |
+
journal={arXiv preprint arXiv:2601.01718},
|
| 141 |
+
year={2025}
|
| 142 |
+
}
|
| 143 |
+
```
|