Yuan3.0-Flash / README.md
nielsr's picture
nielsr HF Staff
Add metadata and link to research paper
dca8477 verified
|
raw
history blame
7.42 kB
metadata
license: other
library_name: transformers
pipeline_tag: image-text-to-text

Yuan 3.0 Multimodal Foundation Model


GitHub ModelScope Twitter Follow arXiv

This repository contains Yuan 3.0 Flash, a Mixture-of-Experts (MoE) Multimodal Large Language Model featuring 3.7B activated parameters and 40B total parameters, specifically designed to enhance performance on enterprise-oriented tasks. It was introduced in the paper Yuan3.0 Flash: An Open Multimodal Large Language Model for Enterprise Applications.

Latest Updates πŸŽ‰πŸŽ‰

  • [2025-12-30] Released Yuan 3.0-40B Multimodal Large Language Model, a high-performance model for enterprise-grade application scenarios: Yuan3.0 Flash

1. Introduction

Yuan 3.0 Flash, developed by the YuanLab.ai team, is a 40B parameter multimodal foundation model that employs a Mixture of Experts (MoE) architecture, activating only approximately 3.7B parameters per inference. Through innovative reinforcement learning training methods (RAPO), it significantly reduces inference token consumption while improving reasoning accuracy, exploring the innovative path of "less computation, higher intelligence" for large language models.

Fig.1: Yuan3.0 Multimodal Large Language Model Architecture

Core Features

  • πŸš€ Efficient Inference: Reduces inference token consumption by up to 75%, significantly lowering costs
  • 🎯 Enterprise-Grade Optimization: Deeply optimized for enterprise scenarios such as RAG, document understanding, and table analysis
  • 🎨 Multimodal Support: Supports text, image, table, document and other multimodal inputs
  • πŸ“š Long Context: Supports 128K context length, achieving 100% accuracy in "Needle in a Haystack" tests
  • ⚑ Ready-to-Use Intelligence: Default inference mode meets the needs of most enterprise scenarios

2. Performance

Yuan 3.0 Flash outperforms GPT-5.1 in enterprise-grade RAG, multimodal retrieval, table understanding, summary generation and other tasks. With 40B parameters, it achieves the reasoning accuracy of 235B/671B models while reducing token consumption by 50%-75%, providing enterprises with high-performance, low-cost large language model solutions.

Fig.2: Yuan3.0 Flash Evaluation Results

3. Core Technology

RAPO Reinforcement Learning Algorithm

The innovative Reflection-aware Adaptive Policy Optimization (RAPO) algorithm, through the Reflection Inhibition Reward Mechanism (RIRM):

  • βœ… Identifies the key point where the correct answer is first obtained
  • 🎯 Suppresses subsequent redundant reasoning behavior
  • πŸ“‰ Improves accuracy while reducing inference token count by approximately 75%
Training Method AIME 2024 Accuracy Avg Output Length MATH-500 Accuracy Avg Output Length
Yuan3.0 Flash (40B) SFT 31.45% 13,656 tokens 83.20% 3,362 tokens
RL+DAPO length-penalty 46.35% 13,781 tokens 89.06% 3,974 tokens
RL+RIRM 47.92% 7,505 tokens 89.47% 1,777 tokens

4. Model Download

We provide download links for multiple model formats:

Model Parameters Precision Sequence Length Model Format Download Link
Yuan3.0 Flash 40B 16bit 128K HuggingFace ModelScope | HuggingFace | WiseModel
Yuan3.0 Flash 4bit 40B 4bit 128K HuggingFace ModelScope | HuggingFace | WiseModel

5. Evaluation Results

5.1 Text-based RAG Evaluation: ChatRAG πŸ†

Yuan 3.0 Flash leads DeepSeek-V3, DeepSeek-R1 and other large language models in average accuracy across 10 evaluation tasks in the industry-standard RAG benchmark ChatRAG.

Model Average Accuracy Comparison

Models Avg All D2D QuAC QReCC CoQA DoQA CFQA SQA TCQA HDial INSCIT
DeepSeek-V3 50.47 31.59 28.86 49.31 76.98 26.11 83.49 82.13 46.69 47.43 32.08
OpenAI GPT-4o 50.54 32.76 26.56 49.30 76.11 28.78 81.85 81.14 49.75 41.29 26.69
Yuan3.0 Flash 64.47 49.82 53.79 57.08 90.93 59.99 74.40 87.52 66.31 68.45 36.40

5.2 Multimodal RAG Evaluation: Docmatix πŸ†

Models Avg.
Qwen2.5-VL-72B-Instruct 59.75
OpenAI GPT-4V 60.10
Yuan3.0 Flash 65.07

5.3 Multimodal Complex Table Content Analysis Evaluation: MMTab πŸ†

Models Avg. TABMWP WTQ WTQ HiTab
OpenAI GPT-5.1 55.15 64.95 60.77 77.77 61.37
Yuan3.0 Flash 58.29 95.09 68.23 69.80 69.17

5.4 Text Summarization Generation Evaluation: SummEval πŸ†

Models Avg. Lexical Overlap ROUGE-1 Semantic Similarity BERTScore Factual Consistency SummaC
DeepSeek-V3 59.28 25.50 86.30 68.20
Yuan3.0 Flash 59.31 51.32 89.99 45.34

6. Quick Start

For specific usage methods, please refer to the official QuickStart guide.

7. License Agreement

The use of Yuan 3.0 code and models must comply with the γ€ŠYuan 3.0 Model License Agreement》. The Yuan 3.0 model supports commercial use without requiring authorization application.

8. Citation

@article{yuan3flash2025,
  title={Yuan3.0 Flash: An Open Multimodal Large Language Model for Enterprise Applications},
  author={YuanLab.ai and others},
  journal={arXiv preprint arXiv:2601.01718},
  year={2025}
}