File size: 7,415 Bytes
dca8477
 
 
 
 
 
a2af15a
 
 
 
 
 
 
 
 
 
 
 
0c059f4
a2af15a
0c059f4
a2af15a
 
 
 
 
 
dca8477
a2af15a
 
 
 
 
 
 
dca8477
a2af15a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
dca8477
a2af15a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
dca8477
 
 
 
a2af15a
 
 
 
 
dca8477
 
 
 
a2af15a
dca8477
a2af15a
dca8477
a2af15a
dca8477
 
a2af15a
dca8477
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
---
license: other
library_name: transformers
pipeline_tag: image-text-to-text
---

<div align="center">
<h1>
  Yuan 3.0 Multimodal Foundation Model
</h1>
</div>

<hr>
<div align="center" style="line-height: 1;">
  <a href="https://github.com/Yuan-lab-LLM/Yuan3.0"><img alt="GitHub"
    src="https://img.shields.io/badge/GitHub-Yuan%203.0%20Repo-181717?logo=github&logoColor=white"/></a>
  <a href="https://www.modelscope.cn/profile/Yuanlab"><img alt="ModelScope"
    src="https://img.shields.io/badge/πŸ’Ύ%20ModelScope-Yuan3.0-6b4fbb?color=6b4fbb&logoColor=white"/></a>
  <a href="https://x.com/YuanAI_Lab"><img alt="Twitter Follow"
    src="https://img.shields.io/badge/Twitter-Yuanlabai-white?logo=x&logoColor=white"/></a>
  <a href="https://arxiv.org/abs/2601.01718"><img alt="arXiv"
    src="https://img.shields.io/badge/arXiv-Yuan3.0%20Paper-b31b1b?logo=arxiv&logoColor=white"/></a>
  </a>
</div>

-----

This repository contains **Yuan 3.0 Flash**, a Mixture-of-Experts (MoE) Multimodal Large Language Model featuring 3.7B activated parameters and 40B total parameters, specifically designed to enhance performance on enterprise-oriented tasks. It was introduced in the paper [Yuan3.0 Flash: An Open Multimodal Large Language Model for Enterprise Applications](https://huggingface.co/papers/2601.01718).

## Latest Updates πŸŽ‰πŸŽ‰

* **[2025-12-30]** **Released Yuan 3.0-40B Multimodal Large Language Model, a high-performance model for enterprise-grade application scenarios: Yuan3.0 Flash**

## 1. Introduction

Yuan 3.0 Flash, developed by the **YuanLab.ai team**, is a **40B parameter multimodal foundation model** that employs a Mixture of Experts (MoE) architecture, activating only approximately **3.7B parameters** per inference. Through innovative reinforcement learning training methods (RAPO), it significantly reduces inference token consumption while improving reasoning accuracy, exploring the innovative path of "less computation, higher intelligence" for large language models.

<div align="center">
  <img src="https://huggingface.co/YuanLabAI/Yuan3.0-Flash-4bit/resolve/main/docs/Yuan3.0-architecture.png" width="80%" />
Fig.1: Yuan3.0 Multimodal Large Language Model Architecture
</div>

### Core Features

- πŸš€ **Efficient Inference**: Reduces inference token consumption by up to 75%, significantly lowering costs
- 🎯 **Enterprise-Grade Optimization**: Deeply optimized for enterprise scenarios such as RAG, document understanding, and table analysis
- 🎨 **Multimodal Support**: Supports text, image, table, document and other multimodal inputs
- πŸ“š **Long Context**: Supports 128K context length, achieving 100% accuracy in "Needle in a Haystack" tests
- ⚑ **Ready-to-Use Intelligence**: Default inference mode meets the needs of most enterprise scenarios

## 2. Performance

Yuan 3.0 Flash outperforms GPT-5.1 in enterprise-grade RAG, multimodal retrieval, table understanding, summary generation and other tasks. With 40B parameters, it achieves the reasoning accuracy of 235B/671B models while reducing token consumption by 50%-75%, providing enterprises with high-performance, low-cost large language model solutions.


<div align="center">
  <img src="https://huggingface.co/YuanLabAI/Yuan3.0-Flash-4bit/resolve/main/docs/Yuan3.0-benchmarks.png" width="80%" />
Fig.2: Yuan3.0 Flash Evaluation Results
</div>

## 3. Core Technology

### RAPO Reinforcement Learning Algorithm

The innovative **Reflection-aware Adaptive Policy Optimization (RAPO)** algorithm, through the Reflection Inhibition Reward Mechanism (RIRM):

- βœ… Identifies the key point where the correct answer is first obtained
- 🎯 Suppresses subsequent redundant reasoning behavior
- πŸ“‰ Improves accuracy while reducing inference token count by approximately 75%

| Training Method | AIME 2024 Accuracy | Avg Output Length | MATH-500 Accuracy | Avg Output Length |
|---------|------------------|--------------|-----------------|--------------|
| Yuan3.0 Flash (40B) SFT | 31.45% | 13,656 tokens | 83.20% | 3,362 tokens |
| RL+DAPO length-penalty | 46.35% | 13,781 tokens | 89.06% | 3,974 tokens |
| **RL+RIRM** | **47.92%** | **7,505 tokens** | **89.47%** | **1,777 tokens** |

## 4. Model Download

**We provide download links for multiple model formats:**

|    Model     |   Parameters  |  Precision  |   Sequence Length  |   Model Format   |         Download Link         |
| :----------: | :------: | :------: | :------: | :-------: |:---------------------------: |
| Yuan3.0 Flash |    40B    |  16bit    |    128K    |    HuggingFace    | [ModelScope]( https://modelscope.cn/models/Yuanlab/Yuan3.0-Flash) \| [HuggingFace]( https://huggingface.co/YuanLabAI/Yuan3.0-Flash) \|  [WiseModel]( https://www.wisemodel.cn/models/YuanLabAI/Yuan3.0-Flash)
| Yuan3.0 Flash 4bit |    40B   |  4bit     |    128K    |    HuggingFace    | [ModelScope]( https://modelscope.cn/models/Yuanlab/Yuan3.0-Flash-int4) \| [HuggingFace]( https://huggingface.co/YuanLabAI/Yuan3.0-Flash-4bit) \|  [WiseModel]( https://www.wisemodel.cn/models/YuanLab/Yuan3.0-Flash-4bit)

## 5. Evaluation Results

**5.1 Text-based RAG Evaluation: ChatRAG** πŸ†

Yuan 3.0 Flash leads DeepSeek-V3, DeepSeek-R1 and other large language models in average accuracy across 10 evaluation tasks in the industry-standard RAG benchmark ChatRAG.

**Model Average Accuracy Comparison**

| Models | Avg All | D2D | QuAC | QReCC | CoQA | DoQA | CFQA | SQA | TCQA | HDial | INSCIT |
|--------|---------|-----|------|-------|------|------|------|-----|------|-------|--------|
| **DeepSeek-V3** | 50.47 | 31.59 | 28.86 | 49.31 | 76.98 | 26.11 | 83.49 | 82.13 | 46.69 | 47.43 | 32.08 |
| **OpenAI GPT-4o** | 50.54 | 32.76 | 26.56 | 49.30 | 76.11 | 28.78 | 81.85 | 81.14 | 49.75 | 41.29 | 26.69 |
| **Yuan3.0 Flash** | **64.47** | 49.82 | 53.79 | 57.08 | 90.93 | 59.99 | 74.40 | 87.52 | 66.31 | 68.45 | 36.40 |

---

**5.2 Multimodal RAG Evaluation: Docmatix** πŸ†

| Models | Avg. |
|--------|:---------:|
| **Qwen2.5-VL-72B-Instruct** | 59.75 |
| **OpenAI GPT-4V** | 60.10 |
| **Yuan3.0 Flash** | **65.07** |

---

**5.3 Multimodal Complex Table Content Analysis Evaluation: MMTab** πŸ†

| Models | Avg. | TABMWP | WTQ | WTQ | HiTab |
|--------|:----:|:------:|:---:|:---:|:-----:|
| **OpenAI GPT-5.1** | 55.15 | 64.95 | 60.77 | 77.77 | 61.37 |
| **Yuan3.0 Flash** | 58.29 | 95.09 | 68.23 | 69.80 | 69.17 |

---

**5.4 Text Summarization Generation Evaluation: SummEval** πŸ†

| Models | Avg. | Lexical Overlap ROUGE-1 | Semantic Similarity BERTScore | Factual Consistency SummaC |
|--------|:---------:|:-----------:|:--------------:|:------------:|
| **DeepSeek-V3** | 59.28 | 25.50 | 86.30 | 68.20 |
| **Yuan3.0 Flash** | **59.31** | 51.32 | 89.99 | 45.34 |

## 6. Quick Start

For specific usage methods, please refer to the official [QuickStart](https://github.com/Yuan-lab-LLM/Yuan3.0/blob/main/vllm/README_Yuan.md) guide.

## 7. License Agreement
The use of Yuan 3.0 code and models must comply with the [γ€ŠYuan 3.0 Model License Agreement》](https://github.com/Yuan-lab-LLM/Yuan3.0?tab=License-1-ov-file). The Yuan 3.0 model supports commercial use without requiring authorization application.

## 8. Citation
```bibtex
@article{yuan3flash2025,
  title={Yuan3.0 Flash: An Open Multimodal Large Language Model for Enterprise Applications},
  author={YuanLab.ai and others},
  journal={arXiv preprint arXiv:2601.01718},
  year={2025}
}
```