TeleAI-AI-Flow commited on
Commit
8a5a8df
·
verified ·
1 Parent(s): 02af1a3

Upload README_en.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README_en.md +214 -0
README_en.md ADDED
@@ -0,0 +1,214 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ frameworks:
3
+ - Pytorch
4
+ license: Apache License 2.0
5
+ tasks:
6
+ - text-generation
7
+ ---
8
+
9
+ # AI-Flow-Ruyi (如意大模型)
10
+
11
+ <p align="center">
12
+ <img src="assets/AI-Flow-Ruyi-logo.png" width="500" />
13
+ </p>
14
+
15
+ <p align="center">
16
+ <a href="README.md">中文</a> &nbsp | &nbsp <a href="README_en.md">English</a>
17
+ <br>
18
+ 🤗 <a href="https://huggingface.co/TeleAI-AI-Flow/AI-Flow-Ruyi-7B-Preview0704">Hugging Face</a>&nbsp&nbsp | &nbsp&nbsp🤖 <a href="https://www.modelscope.cn/models/TeleAI-AI-Flow/AI-Flow-Ruyi-7B-Preview0704/">ModelScope</a>&nbsp&nbsp | &nbsp&nbsp 📑&nbsp <a href="https://www.arxiv.org/abs/2506.12479">Paper</a>
19
+ </p>
20
+
21
+ #### Long long ago...
22
+ > Deep within the Loong King's palace lay a divine rod, capable of infinite transformations, shrinking and growing at will. One day, finding himself at leisure, the Loong King mused to the rod: "With such formidable power, if only you could aid our Loong tribe in new endeavors." Suddenly, the rod spoke in reply: "I have an idea. What if this transformative ability were used to help humankind solve their problems?" No sooner said than done. The rod shimmered and transformed into an immensely powerful model, its "capabilities" freely scaling to match the complexity of any challenge. Beholding this marvel, the Loong King exclaimed, "Why, this is a true 'Ruyi' treasure—a wish-fulfilling aid to resolve all manner of troubles!" Thus, he named it "Ruyi" and sent it forth into the human world to offer its assistance.
23
+
24
+ ## News
25
+
26
+ * 🎉🎉[2025/7/4]:AI-Flow-Ruyi-7B-Preview released!
27
+
28
+ ## Introduction
29
+
30
+ **AI-Flow-Ruyi** is a **Familial Model** developed by the AI Flow team under the leadership of Professor Li Xuelong, CTO, Chief Scientist of China Telecom, and President of the Institute of Artificial Intelligence (TeleAI), China Telecom. Designed for next-generation "Device-Edge-Cloud" model service architectures, its core innovation lies in **shared familial parameters** across large and small models. Leveraging an **early-exit mechanism**, the system dynamically routes queries to branch models of appropriate parameter sizes based on problem complexity. These branches operate independently while enabling **information sharing** and **seamless transitions** through their shared features. Combined with distributed Device-Edge-Cloud deployment, this facilitates **collaborative inference** within the model family, significantly enhancing distributed reasoning efficiency.
31
+
32
+ ![](assets/ai-flow.png)
33
+ ![](assets/ruyi_model.png)
34
+
35
+ ## AI-Flow-Ruyi-7B-Preview
36
+
37
+ The AI-Flow-Ruyi-7B-Preview was released on July 4th. Its largest branch operates at 7B parameters and can differentiate into early-exit variants with **equivalent capacities of 3B, 4B, 5B, and 6B parameters**.
38
+
39
+ Key branch specializations:
40
+ * **3B/4B branches**: Optimized for simple dialogue scenarios, delivering **faster response times** with **minimal resource consumption**
41
+ * **5B/6B branches**: Targeting daily general-purpose tasks, **striking a balance** between capability and responsiveness
42
+ * **7B branch**: Designed for complex problem-solving, **exhibiting more well-rounded capabilities** across multiple dimensions – though with **moderately slower inference speeds** and **higher resource demands**
43
+
44
+ |Position No.|Early-Exit Layer|Equivalent Model Size|Branch Designation|Target Scenario|
45
+ |:-:|:-:|:-:|:-:|:-:|
46
+ |1|Layer 11|3B|AI-Flow-Ruyi-7B-E3B|Simple dialogue|
47
+ |2|Layer 15|4B|AI-Flow-Ruyi-7B-E4B|Simple dialogue|
48
+ |3|Layer 19|5B|AI-Flow-Ruyi-7B-E5B|Daily tasks|
49
+ |4|Layer 23|6B|AI-Flow-Ruyi-7B-E6B|Daily tasks|
50
+ |5|Layer 27|7B|AI-Flow-Ruyi-7B-E7B|Complex problems
51
+ |
52
+
53
+ ### Training process
54
+
55
+ Prior to training initiation, we initialized parameters for the 7B main branch using Qwen team's pre-trained [Qwen2.5-7B](https://arxiv.org/abs/2412.15115) (pre-trained on 18 trillion high-quality tokens). For early-exit branches, decoder layers were initialized with parameters from the subsequent layer of their respective early-exit positions.
56
+
57
+ Following initialization, we conducted **multi-branch joint pre-training** with approximately 400 billion tokens on proprietary high-quality datasets, resulting in the AI-Flow-Ruyi-7B-Base foundation model.
58
+
59
+ Subsequently, we performed **multi-branch joint instruction-following fine-tuning** across all branches using ~1.2 million high-quality instruction samples, yielding the AI-Flow-Ruyi-7B-Preview.
60
+
61
+ ### Performance review
62
+
63
+ We conduct a review based on [OpenCompass](https://github.com/open-compass/opencompass) and its official configuration files on multiple datasets in a 0-shot manner. The evaluation results show that the 7B master branch is basically equal to Qwen2.5-7B-Instruct in terms of general-purpose task performance.
64
+
65
+ <details>
66
+ <summary>Common tasks review</summary>
67
+
68
+ |Model|MMLU|MMLU-Pro|CMMLU|ARC-c|BBH|Mean|
69
+ |:-:|:-:|:-:|:-:|:-:|:-:|:-:|
70
+ |Qwen3-8B(think)|74.78|66.02|76.33|63.39|60.68|68.24|
71
+ |Qwen2.5-7B-Instruct|70.88|56.33|75.71|86.44|51.51|68.17|
72
+ |Llama-3.1-8B-Instruct|53.16|45.36|51.65|83.73|72.47|61.27|
73
+ |AI-Flow-Ruyi-7B-E7B<b>(ours)</b>|87.19|59.78|48.14|69.83|74.47|67.88|
74
+
75
+ </details>
76
+
77
+ <details>
78
+ <summary>Code tasks review</summary>
79
+
80
+ |Model|MBPP|HumanEval|LiveCodeBench|Mean|
81
+ |:-:|:-:|:-:|:-:|:-:|
82
+ |Qwen3-8B(think)|78.60|84.76|63.10|75.49|
83
+ |Qwen2.5-7B-Instruct|70.82|84.15|34.55|63.17|
84
+ |Llama3.1-8B-Instruct|68.48|63.41|8.15|46.68|
85
+ |AI-Flow-Ruyi-7B-E7B<b>(ours)</b>|66.93|64.63|30.01|53.86|
86
+
87
+ </details>
88
+
89
+ <details>
90
+ <summary>STEM tasks review</summary>
91
+
92
+ |Model|Math|GPQA|GSM-8K|Mean|
93
+ |:-:|:-:|:-:|:-:|:-:|
94
+ |Qwen3-8B(think)|83.84|38.38|93.03|71.75|
95
+ |Qwen2.5-7B-Instruct|73.66|35.35|88.48|65.83|
96
+ |Llama3.1-8B-Instruct|49.22|25.25|85.82|53.43|
97
+ |AI-Flow-Ruyi-7B-E7B<b>(ours)</b>|44.94|24.75|81.65|50.45|
98
+
99
+ </details>
100
+
101
+
102
+ At the same time, the performance of each early exit branch shows a monotonically increasing trend with the number of equivalent parameters.
103
+
104
+ |Model|MMLU|MMLU-Pro|CMMLU|ARC-c|BBH|Mean|
105
+ |:-:|:-:|:-:|:-:|:-:|:-:|:-:|
106
+ |AI-Flow-Ruyi-7B-E3B<b>(ours)</b>|66.93|44.70|19.80|40.00|32.29|40.74|
107
+ |AI-Flow-Ruyi-7B-E4B<b>(ours)</b>|78.86|48.60|26.51|58.98|41.98|50.99|
108
+ |AI-Flow-Ruyi-7B-E5B<b>(ours)</b>|75.34|49.13|33.91|65.76|64.48|57.72|
109
+ |AI-Flow-Ruyi-7B-E6B<b>(ours)</b>|84.58|53.06|33.94|73.22|47.33|58.43|
110
+ |AI-Flow-Ruyi-7B-E7B<b>(ours)</b>|87.19|59.78|48.14|69.83|74.47|67.88|
111
+
112
+ ## Usage
113
+
114
+ Step 1. Create and activate a virtual environment
115
+
116
+ ```sh
117
+ conda create -n ruyi python=3.12
118
+ conda activate ruyi
119
+ ```
120
+
121
+ Step 2. Clone this warehouse to local
122
+
123
+ ```sh
124
+ git clone https://github.com/TeleAI-AI-Flow/AI-Flow-Ruyi.git
125
+ cd AI-Flow-Ruyi
126
+ ```
127
+
128
+ Step 3. Installation from source (PS: flash_attn compilation and installation is slow, it is recommended to move to the [official repository](https://github.com/Dao-AILab/flash-attention/releases/tag/v2.7.4.post1) to download whl manual installation)
129
+
130
+ ```sh
131
+ pip install -e .
132
+ ```
133
+
134
+ Step 4. Download model weights
135
+
136
+ ```sh
137
+ git clone https://www.modelscope.cn/TeleAI-AI-Flow/AI-Flow-Ruyi-7B-Preview0704.git models/AI-Flow-Ruyi-7B-Preview0704
138
+ ```
139
+
140
+ Step 5. Run Demo
141
+
142
+ ```sh
143
+ python demo.py
144
+ ```
145
+
146
+ <details>
147
+ <summary>View demo code</summary>
148
+
149
+ ```py
150
+ import torch
151
+ from ruyi.global_var import set_global_val
152
+ from transformers import GenerationConfig
153
+ from transformers import AutoModelForCausalLM, AutoTokenizer
154
+
155
+
156
+ model_path = f"models/AI-Flow-Ruyi-7B-Preview0704"
157
+ tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
158
+ model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True, attn_implementation='flash_attention_2', torch_dtype=torch.bfloat16).to('cuda')
159
+
160
+
161
+ generation_config = GenerationConfig(
162
+ do_sample=True,
163
+ top_k=30,
164
+ top_p=0.95,
165
+ temperature=0.6,
166
+ repetition_penalty=1.2,
167
+ no_repeat_ngram_size=3,
168
+ max_new_tokens=8192
169
+ )
170
+
171
+ # input text
172
+ messages = [
173
+ {"role": "user", "content": "Introduce yourself."},
174
+ ]
175
+
176
+ # Apply chat_template template
177
+ prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
178
+ inputs = tokenizer(prompt, return_tensors="pt")
179
+
180
+ # Model Generation
181
+ with torch.no_grad():
182
+ # Setting the early exit point
183
+ # - 11: First early exit point corresponding to about 3B.
184
+ # - 15: second early exit point, corresponding to approximately 4B.
185
+ # - 19: third early exit point, corresponding to about 5B.
186
+ # - 23: fourth early exit point, corresponding to approximately 6B.
187
+ # - 27: fifth early exit point, corresponding to about 7B.
188
+ set_global_val("early_exit_point", 11)
189
+
190
+ output = model.generate(
191
+ inputs["input_ids"].to('cuda'),
192
+ generation_config=generation_config
193
+ )
194
+
195
+ # Decode and print results
196
+ generated_text = tokenizer.decode(output[0], skip_special_tokens=False)
197
+ print(generated_text)
198
+ ```
199
+
200
+ </details>
201
+
202
+ ## Citation
203
+
204
+ ```bibtex
205
+ @misc{an2025aiflowperspectivesscenarios,
206
+ title={AI Flow: Perspectives, Scenarios, and Approaches},
207
+ author={Hongjun An and and Wenhan Hu and Sida Huang and Siqi Huang and Ruanjun Li and Yuanzhi Liang and Jiawei Shao and Yiliang Song and Zihan Wang and Cheng Yuan and Chi Zhang and Hongyuan Zhang and Wenhao Zhuang and Xuelong Li},
208
+ year={2025},
209
+ eprint={2506.12479},
210
+ archivePrefix={arXiv},
211
+ primaryClass={cs.AI},
212
+ url={https://arxiv.org/abs/2506.12479},
213
+ }
214
+ ```