Update README.md
Browse files
README.md
CHANGED
|
@@ -33,7 +33,7 @@ Yuan3.0 Ultra employs a unified multimodal model architecture, integrating a vis
|
|
| 33 |
|
| 34 |
Yuan3.0 Ultra enhances the Reflection Inhibition Reward Mechanism (RIRM) proposed in <a href="https://github.com/Yuan-lab-LLM/Yuan3.0" target="_blank">**Yuan3.0 Flash**</a>. By incorporating reward constraints based on the number of reflection steps, the model actively reduces ineffective reflections after arriving at the "first correct answer," while retaining the necessary reasoning depth for complex problems. This approach effectively mitigates the "overthinking" phenomenon in fast-thinking reinforcement learning. Training results demonstrate that under this controlled fast-thinking strategy, the model’s accuracy improves significantly, while the number of tokens generated during reasoning continually decreases—achieving simultaneous gains in both accuracy and computational efficiency.
|
| 35 |
|
| 36 |
-
Additionally, the <a href="./Docs/Yuan3.0_Ultra
|
| 37 |
|
| 38 |
<div align=center> <img src="https://huggingface.co/YuanLabAI/Yuan3.0-Ultra/resolve/main/docs/Yuan3.0-Ultra-architecture.png" width="80%" />
|
| 39 |
|
|
@@ -204,22 +204,7 @@ Spider 1.0 and BIRD are two major benchmarks in the Text-to-SQL domain. Yuan3.0
|
|
| 204 |
| **Yuan3.0 Ultra** | **83.9** | 39.2 |
|
| 205 |
|
| 206 |
|
| 207 |
-
|
| 208 |
-
|
| 209 |
-
## 6. Quick Start
|
| 210 |
-
|
| 211 |
-
### 6.1 Yuan3.0 Ultra Inference
|
| 212 |
-
|
| 213 |
-
Yuan3.0 Ultra supports both bfloat16 and int4 quantized models. For usage details, please refer to [QuickStart](vllm/README_Yuan.md).
|
| 214 |
-
|
| 215 |
-
|
| 216 |
-
### 6.2 Yuan3.0 Ultra Training
|
| 217 |
-
|
| 218 |
-
We provide supervised fine-tuning scripts and reinforcement learning scripts for Yuan3.0 Ultra. Please refer to the fine-tuning training [documentation](rlhf/docs/instruct_tuning.md) and reinforcement learning [documentation](rlhf/docs/RL_training.md).
|
| 219 |
-
|
| 220 |
-
|
| 221 |
-
|
| 222 |
-
## 7. License
|
| 223 |
Use of Yuan 3.0 code and models must comply with the [Yuan 3.0 Model License Agreement](https://github.com/Yuan-lab-LLM/Yuan3.0?tab=License-1-ov-file). Yuan 3.0 models support commercial use and do not require an application for authorization. Please familiarize yourself with and adhere to the agreement. Do not use the open-source models, code, or any derivatives produced from this open-source project for any purposes that may cause harm to the nation or society, or for any services that have not undergone safety assessment and registration.
|
| 224 |
|
| 225 |
Although measures have been taken during training to ensure data compliance and accuracy to the best of our ability, given the enormous scale of model parameters and the influence of probabilistic randomness, we cannot guarantee the accuracy of generated outputs, and models are susceptible to being misled by input instructions. This project assumes no responsibility for data security risks, public opinion risks, or any risks and liabilities arising from the model being misled, misused, disseminated, or improperly exploited due to the use of open-source models and code. You shall bear full and sole responsibility for all risks and consequences arising from your use, copying, distribution, and modification of this open-source project.
|
|
|
|
| 33 |
|
| 34 |
Yuan3.0 Ultra enhances the Reflection Inhibition Reward Mechanism (RIRM) proposed in <a href="https://github.com/Yuan-lab-LLM/Yuan3.0" target="_blank">**Yuan3.0 Flash**</a>. By incorporating reward constraints based on the number of reflection steps, the model actively reduces ineffective reflections after arriving at the "first correct answer," while retaining the necessary reasoning depth for complex problems. This approach effectively mitigates the "overthinking" phenomenon in fast-thinking reinforcement learning. Training results demonstrate that under this controlled fast-thinking strategy, the model’s accuracy improves significantly, while the number of tokens generated during reasoning continually decreases—achieving simultaneous gains in both accuracy and computational efficiency.
|
| 35 |
|
| 36 |
+
Additionally, the <a href="https://github.com/Yuan-lab-LLM/Yuan3.0-Ultra/blob/main/Docs/Yuan3.0_Ultra%20Paper.pdf">**technical report**</a> for Yuan3.0 Ultra has been released, which provides more detailed technical specifications and evaluation results.
|
| 37 |
|
| 38 |
<div align=center> <img src="https://huggingface.co/YuanLabAI/Yuan3.0-Ultra/resolve/main/docs/Yuan3.0-Ultra-architecture.png" width="80%" />
|
| 39 |
|
|
|
|
| 204 |
| **Yuan3.0 Ultra** | **83.9** | 39.2 |
|
| 205 |
|
| 206 |
|
| 207 |
+
## 6. License
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 208 |
Use of Yuan 3.0 code and models must comply with the [Yuan 3.0 Model License Agreement](https://github.com/Yuan-lab-LLM/Yuan3.0?tab=License-1-ov-file). Yuan 3.0 models support commercial use and do not require an application for authorization. Please familiarize yourself with and adhere to the agreement. Do not use the open-source models, code, or any derivatives produced from this open-source project for any purposes that may cause harm to the nation or society, or for any services that have not undergone safety assessment and registration.
|
| 209 |
|
| 210 |
Although measures have been taken during training to ensure data compliance and accuracy to the best of our ability, given the enormous scale of model parameters and the influence of probabilistic randomness, we cannot guarantee the accuracy of generated outputs, and models are susceptible to being misled by input instructions. This project assumes no responsibility for data security risks, public opinion risks, or any risks and liabilities arising from the model being misled, misused, disseminated, or improperly exploited due to the use of open-source models and code. You shall bear full and sole responsibility for all risks and consequences arising from your use, copying, distribution, and modification of this open-source project.
|