anas bwshen-mi commited on
Commit
0f52ff6
·
0 Parent(s):

Duplicate from XiaomiMiMo/MiMo-V2-Flash

Browse files

Co-authored-by: Bowen Shen <bwshen-mi@users.noreply.huggingface.co>

This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. .gitattributes +36 -0
  2. README.md +322 -0
  3. added_tokens.json +28 -0
  4. config.json +213 -0
  5. configuration_mimo_v2_flash.py +109 -0
  6. merges.txt +0 -0
  7. model.safetensors.index.json +0 -0
  8. model_0.safetensors +3 -0
  9. model_1.safetensors +3 -0
  10. model_10.safetensors +3 -0
  11. model_10_linear_fc1.safetensors +3 -0
  12. model_10_linear_fc2.safetensors +3 -0
  13. model_11.safetensors +3 -0
  14. model_11_linear_fc1.safetensors +3 -0
  15. model_11_linear_fc2.safetensors +3 -0
  16. model_12.safetensors +3 -0
  17. model_12_linear_fc1.safetensors +3 -0
  18. model_12_linear_fc2.safetensors +3 -0
  19. model_13.safetensors +3 -0
  20. model_13_linear_fc1.safetensors +3 -0
  21. model_13_linear_fc2.safetensors +3 -0
  22. model_14.safetensors +3 -0
  23. model_14_linear_fc1.safetensors +3 -0
  24. model_14_linear_fc2.safetensors +3 -0
  25. model_15.safetensors +3 -0
  26. model_15_linear_fc1.safetensors +3 -0
  27. model_15_linear_fc2.safetensors +3 -0
  28. model_16.safetensors +3 -0
  29. model_16_linear_fc1.safetensors +3 -0
  30. model_16_linear_fc2.safetensors +3 -0
  31. model_17.safetensors +3 -0
  32. model_17_linear_fc1.safetensors +3 -0
  33. model_17_linear_fc2.safetensors +3 -0
  34. model_18.safetensors +3 -0
  35. model_18_linear_fc1.safetensors +3 -0
  36. model_18_linear_fc2.safetensors +3 -0
  37. model_19.safetensors +3 -0
  38. model_19_linear_fc1.safetensors +3 -0
  39. model_19_linear_fc2.safetensors +3 -0
  40. model_1_linear_fc1.safetensors +3 -0
  41. model_1_linear_fc2.safetensors +3 -0
  42. model_2.safetensors +3 -0
  43. model_20.safetensors +3 -0
  44. model_20_linear_fc1.safetensors +3 -0
  45. model_20_linear_fc2.safetensors +3 -0
  46. model_21.safetensors +3 -0
  47. model_21_linear_fc1.safetensors +3 -0
  48. model_21_linear_fc2.safetensors +3 -0
  49. model_22.safetensors +3 -0
  50. model_22_linear_fc1.safetensors +3 -0
.gitattributes ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,322 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ library_name: transformers
4
+ ---
5
+
6
+ <br/><br/>
7
+
8
+ <div align="center">
9
+ <picture>
10
+ <source srcset="https://github.com/XiaomiMiMo/MiMo-V2-Flash/raw/main/figures/Xiaomi_MiMo_darkmode.png?raw=true" media="(prefers-color-scheme: dark)">
11
+ <img src="https://github.com/XiaomiMiMo/MiMo-V2-Flash/raw/main/figures/Xiaomi_MiMo.png?raw=true" width="60%" alt="Xiaomi-MiMo" />
12
+ </picture>
13
+ </div>
14
+
15
+ <br/>
16
+
17
+ <div align="center" style="line-height: 1;">
18
+ |
19
+ <a href="https://huggingface.co/XiaomiMiMo/MiMo-V2-Flash" target="_blank">🤗 HuggingFace</a>
20
+ &nbsp;|
21
+ <a href="https://github.com/XiaomiMiMo/MiMo-V2-Flash/blob/main/paper.pdf" target="_blank">📔 Technical Report </a>
22
+ &nbsp;|
23
+ <a href="https://mimo.xiaomi.com/blog/mimo-v2-flash" target="_blank">📰 Blog </a>
24
+ &nbsp;|
25
+ <br/><br/>
26
+ <strong>Play around!</strong> &nbsp;
27
+ <a href="https://aistudio.xiaomimimo.com" target="_blank">🗨️ Xiaomi MiMo Studio </a>
28
+ &nbsp;
29
+ <a href="https://platform.xiaomimimo.com/" target="_blank">🎨 Xiaomi MiMo API Platform </a>
30
+ </div>
31
+ <br/>
32
+
33
+ # MiMo-V2-Flash
34
+
35
+ **MiMo-V2-Flash** is a Mixture-of-Experts (MoE) language model with **309B total parameters** and **15B active parameters**. Designed for high-speed reasoning and agentic workflows, it utilizes a novel hybrid attention architecture and Multi-Token Prediction (MTP) to achieve state-of-the-art performance while significantly reducing inference costs.
36
+
37
+ <p align="center">
38
+ <img width="80%" src="https://github.com/XiaomiMiMo/MiMo-V2-Flash/raw/main/figures/MiMo-v2-flash-performance.jpg?raw=true">
39
+ </p>
40
+
41
+ -----
42
+
43
+ ## 1. Introduction
44
+
45
+ MiMo-V2-Flash creates a new balance between long-context modeling capability and inference efficiency. Key features include:
46
+
47
+ * **Hybrid Attention Architecture**: Interleaves Sliding Window Attention (SWA) and Global Attention (GA) with a 5:1 ratio and an aggressive 128-token window. This reduces KV-cache storage by nearly 6x while maintaining long-context performance via learnable **attention sink bias**.
48
+ * **Multi-Token Prediction (MTP)**: Equipped with a lightweight MTP module (0.33B params/block) using dense FFNs. This triples output speed during inference and will be good to accelerates rollout in RL training.
49
+ * **Efficient Pre-Training**: Trained on 27T tokens using FP8 mixed precision and native 32k seq length. The context window supports up to 256k length.
50
+ * **Agentic Capabilities**: Post-training utilizes Multi-Teacher On-Policy Distillation (MOPD) and large-scale agentic RL, achieving superior performance on **SWE-Bench** and complex reasoning tasks.
51
+
52
+ -----
53
+
54
+ ## 2. Model Downloads
55
+
56
+ | Model | Total Params | Active Params | Context Length | Download |
57
+ | :--------------------- | :----------: | :-----------: | :------------: | :-------------------------------------------------------------------: |
58
+ | **MiMo-V2-Flash-Base** | 309B | 15B | 256k | [🤗 HuggingFace](https://huggingface.co/XiaomiMiMo/MiMo-V2-Flash-Base) |
59
+ | **MiMo-V2-Flash** | 309B | 15B | 256k | [🤗 HuggingFace](https://huggingface.co/XiaomiMiMo/MiMo-V2-Flash) |
60
+
61
+ > [!IMPORTANT]
62
+ > We also open-source the 3-layer MTP weights to foster community research.
63
+
64
+ -----
65
+
66
+ ## 3. Evaluation Results
67
+
68
+ ### Base Model Evaluation
69
+
70
+ MiMo-V2-Flash-Base demonstrates strong performance across standard benchmarks, surpassing models with significantly larger parameter counts.
71
+
72
+ | Category | Benchmark | Setting/Length | MiMo-V2-Flash Base | Kimi-K2 Base | DeepSeek-V3.1 Base | DeepSeek-V3.2 Exp Base |
73
+ | :--------------- | :---------------------- | :------------- | :----------------: | :-------------: | :----------------: | :--------------------: |
74
+ | **Params** | **#Activated / #Total** | - | **15B / 309B** | **32B / 1043B** | **37B / 671B** | **37B / 671B** |
75
+ | **General** | BBH | 3-shot | 88.5 | 88.7 | 88.2 | 88.7 |
76
+ | | MMLU | 5-shot | 86.7 | 87.8 | 87.4 | 87.8 |
77
+ | | MMLU-Redux | 5-shot | 90.6 | 90.2 | 90.0 | 90.4 |
78
+ | | MMLU-Pro | 5-shot | 73.2 | 69.2 | 58.8 | 62.1 |
79
+ | | DROP | 3-shot | 84.7 | 83.6 | 86.3 | 86.6 |
80
+ | | ARC-Challenge | 25-shot | 95.9 | 96.2 | 95.6 | 95.5 |
81
+ | | HellaSwag | 10-shot | 88.5 | 94.6 | 89.2 | 89.4 |
82
+ | | WinoGrande | 5-shot | 83.8 | 85.3 | 85.9 | 85.6 |
83
+ | | TriviaQA | 5-shot | 80.3 | 85.1 | 83.5 | 83.9 |
84
+ | | GPQA-Diamond | 5-shot | 55.1 | 48.1 | 51.0 | 52.0 |
85
+ | | SuperGPQA | 5-shot | 41.1 | 44.7 | 42.3 | 43.6 |
86
+ | | SimpleQA | 5-shot | 20.6 | 35.3 | 26.3 | 27.0 |
87
+ | **Math** | GSM8K | 8-shot | 92.3 | 92.1 | 91.4 | 91.1 |
88
+ | | MATH | 4-shot | 71.0 | 70.2 | 62.6 | 62.5 |
89
+ | | AIME 24&25 | 2-shot | 35.3 | 31.6 | 21.6 | 24.8 |
90
+ | **Code** | HumanEval+ | 1-shot | 70.7 | 84.8 | 64.6 | 67.7 |
91
+ | | MBPP+ | 3-shot | 71.4 | 73.8 | 72.2 | 69.8 |
92
+ | | CRUXEval-I | 1-shot | 67.5 | 74.0 | 62.1 | 63.9 |
93
+ | | CRUXEval-O | 1-shot | 79.1 | 83.5 | 76.4 | 74.9 |
94
+ | | MultiPL-E HumanEval | 0-shot | 59.5 | 60.5 | 45.9 | 45.7 |
95
+ | | MultiPL-E MBPP | 0-shot | 56.7 | 58.8 | 52.5 | 50.6 |
96
+ | | BigCodeBench | 0-shot | 70.1 | 61.7 | 63.0 | 62.9 |
97
+ | | LiveCodeBench v6 | 1-shot | 30.8 | 26.3 | 24.8 | 24.9 |
98
+ | | SWE-Bench (AgentLess) | 3-shot | 30.8 | 28.2 | 24.8 | 9.4* |
99
+ | **Chinese** | C-Eval | 5-shot | 87.9 | 92.5 | 90.0 | 91.0 |
100
+ | | CMMLU | 5-shot | 87.4 | 90.9 | 88.8 | 88.9 |
101
+ | | C-SimpleQA | 5-shot | 61.5 | 77.6 | 70.9 | 68.0 |
102
+ | **Multilingual** | GlobalMMLU | 5-shot | 76.6 | 80.7 | 81.9 | 82.0 |
103
+ | | INCLUDE | 5-shot | 71.4 | 75.3 | 77.2 | 77.2 |
104
+ | **Long Context** | NIAH-Multi | 32K | 99.3 | 99.8 | 99.7 | 85.6* |
105
+ | | | 64K | 99.9 | 100.0 | 98.6 | 85.9* |
106
+ | | | 128K | 98.6 | 99.5 | 97.2 | 94.3* |
107
+ | | | 256K | 96.7 | - | - | - |
108
+ | | GSM-Infinite Hard | 16K | 37.7 | 34.6 | 41.5 | 50.4 |
109
+ | | | 32K | 33.7 | 26.1 | 38.8 | 45.2 |
110
+ | | | 64K | 31.5 | 16.0 | 34.7 | 32.6 |
111
+ | | | 128K | 29.0 | 8.8 | 28.7 | 25.7 |
112
+
113
+ > \* indicates the model may fail to follow the prompt or format.
114
+
115
+ ### Post-training Model Evaluation
116
+
117
+ Following our Post-Training Paradigm with MOPD and Agentic RL, the model achieves SOTA reasoning and agentic performance.
118
+
119
+
120
+
121
+ | Benchmark | MiMo-V2 Flash | Kimi-K2 Thinking | DeepSeek-V3.2 Thinking | Gemini-3.0 Pro | Claude Sonnet 4.5 | GPT-5 High |
122
+ | :----------------------------- | :-----------: | :--------------: | :--------------------: | :------------: | :---------------: | :--------: |
123
+ | **Reasoning** | | | | | | |
124
+ | MMLU-Pro | 84.9 | 84.6 | 85.0 | 90.1 | 88.2 | 87.5 |
125
+ | GPQA-Diamond | 83.7 | 84.5 | 82.4 | 91.9 | 83.4 | 85.7 |
126
+ | HLE (no tools) | 22.1 | 23.9 | 25.1 | 37.5 | 13.7 | 26.3 |
127
+ | AIME 2025 | 94.1 | 94.5 | 93.1 | 95.0 | 87.0 | 94.6 |
128
+ | HMMT Feb. 2025 | 84.4 | 89.4 | 92.5 | 97.5 | 79.2 | 88.3 |
129
+ | LiveCodeBench-v6 | 80.6 | 83.1 | 83.3 | 90.7 | 64.0 | 84.5 |
130
+ | **General Writing** | | | | | | |
131
+ | Arena-Hard (Hard Prompt) | 54.1 | 71.9 | 53.4 | 72.6 | 63.3 | 71.9 |
132
+ | Arena-Hard (Creative Writing) | 86.2 | 80.1 | 88.8 | 93.6 | 76.7 | 92.2 |
133
+ | **Long Context** | | | | | | |
134
+ | LongBench V2 | 60.6 | 45.1 | 58.4 | 65.6 | 61.8 | - |
135
+ | MRCR | 45.7 | 44.2 | 55.5 | 89.7 | 55.4 | - |
136
+ | **Code Agent** | | | | | | |
137
+ | SWE-Bench Verified | 73.4 | 71.3 | 73.1 | 76.2 | 77.2 | 74.9 |
138
+ | SWE-Bench Multilingual | 71.7 | 61.1 | 70.2 | - | 68.0 | 55.3 |
139
+ | Terminal-Bench Hard | 30.5 | 30.6 | 35.4 | 39.0 | 33.3 | 30.5 |
140
+ | Terminal-Bench 2.0 | 38.5 | 35.7 | 46.4 | 54.2 | 42.8 | 35.2 |
141
+ | **General Agent** | | | | | | |
142
+ | BrowseComp | 45.4 | - | 51.4 | - | 24.1 | 54.9 |
143
+ | BrowseComp (w/ Context Manage) | 58.3 | 60.2 | 67.6 | 59.2 | - | - |
144
+ | \\(\tau^2\\)-Bench | 80.3 | 74.3 | 80.3 | 85.4 | 84.7 | 80.2 |
145
+
146
+ -----
147
+
148
+ ## 4. Model Architecture
149
+
150
+ <p align="center">
151
+ <img width="80%" src="https://github.com/XiaomiMiMo/MiMo-V2-Flash/raw/main/figures/MiMo-v2-flash-arch.png?raw=true">
152
+ </p>
153
+
154
+ ### Hybrid Sliding Window Attention
155
+
156
+ MiMo-V2-Flash addresses the quadratic complexity of long contexts by interleaving Local Sliding Window Attention (SWA) and Global Attention (GA).
157
+
158
+ * **Configuration**: Stacks of \\(M=8\\) hybrid blocks. Each block contains \\(N=5\\) SWA layers followed by 1 GA layer.
159
+ * **Efficiency**: SWA layers use a window size of 128 tokens, reducing KV cache significantly.
160
+ * **Sink Bias**: Learnable attention sink bias is applied to maintain performance despite the aggressive window size.
161
+
162
+ ### Lightweight Multi-Token Prediction (MTP)
163
+
164
+ Unlike traditional speculative decoding, our MTP module is natively integrated for training and inference.
165
+
166
+ * **Structure**: Uses a dense FFN (instead of MoE) and SWA (instead of GA) to keep the parameter count low (0.33B per block).
167
+ * **Performance**: Facilitates self-speculative decoding, tripling generation speed and mitigating GPU idleness during small-batch RL training.
168
+
169
+ -----
170
+
171
+ ## 5. Post-Training Technical Highlights
172
+
173
+ MiMo-V2-Flash leverages a post-training pipeline designed to maximize reasoning and agentic capabilities through innovative distillation and reinforcement learning strategies.
174
+
175
+ ### 5.1 Multi-Teacher On-Policy Distillation (MOPD)
176
+
177
+ We introduce **Multi-Teacher On-Policy Distillation (MOPD)**, a new paradigm that formulates knowledge distillation as a reinforcement learning process.
178
+ * **Dense Token-Level Guidance**: Unlike methods relying on sparse sequence-level feedback, MOPD utilizes domain-specific expert models (teachers) to provide supervision at every token position.
179
+ * **On-Policy Optimization**: The student model learns from its own generated responses rather than a fixed dataset. This eliminates exposure bias and ensures smaller, more stable gradient updates.
180
+ * **Inherent Reward Robustness**: Rewards are derived from the distribution divergence between student and teacher, making the process naturally resistant to reward hacking.
181
+
182
+ ### 5.2 Scaling Agentic RL
183
+
184
+ We significantly scale up the agentic training environments to improve intelligence and generalization.
185
+ * **Massive Code Agent Environments**: We utilize real-world GitHub issues to create over 100,000 verifiable tasks. Our automated pipeline maintains a Kubernetes cluster capable of running over 10,000 concurrent pods with a 70% environment setup success rate.
186
+ * **Multimodal Verifier for WebDev**: For web development tasks, we employ a vision-based verifier that evaluates code execution via recorded videos rather than static screenshots. This reduces visual hallucination and ensures functional correctness.
187
+ * **Cross-Domain Generalization**: Our experiments show that large-scale RL training on code agents effectively generalizes to other domains, boosting performance in Math and General Agent tasks.
188
+
189
+ ### 5.3 Advanced RL Infrastructure
190
+
191
+ To support high-throughput RL training for large-scale MoE models, we implemented several infrastructure optimizations on top of SGLang and Megatron-LM.
192
+ * **Rollout Routing Replay (R3)**: Addresses numerical precision inconsistencies in MoE routing between inference and training. R3 reuses the exact routed experts from rollout during the training pass, ensuring consistency with negligible overhead.
193
+ * **Request-Level Prefix Cache**: In multi-turn agent training, this cache stores KV states and routed experts from prior turns. It avoids re-computation and ensures sampling consistency across turns.
194
+ * **Fine-Grained Data Scheduler**: We extend the rollout engine to schedule fine-grained sequences instead of micro-batches. Combined with partial rollout, this significantly reduces GPU idleness caused by long-tail stragglers.
195
+ * **Toolbox & Tool Manager**: A two-layer design using Ray actor pools to handle resource contention. It eliminates cold-start delays for tool execution and isolates task logic from system policies.
196
+
197
+ -----
198
+
199
+ ## 6. Inference & Deployment
200
+
201
+ MiMo-V2-Flash supports FP8 mixed precision inference. We recommend using **SGLang** for optimal performance.
202
+
203
+ ### Quick Start with SGLang
204
+
205
+ ```bash
206
+ pip install sglang
207
+
208
+ # Launch server
209
+ python3 -m sglang.launch_server \
210
+ --model-path XiaomiMiMo/MiMo-V2-Flash \
211
+ --served-model-name mimo-v2-flash \
212
+ --pp-size 1 \
213
+ --dp-size 2 \
214
+ --enable-dp-attention \
215
+ --tp-size 8 \
216
+ --moe-a2a-backend deepep \
217
+ --page-size 1 \
218
+ --host 0.0.0.0 \
219
+ --port 9001 \
220
+ --trust-remote-code \
221
+ --mem-fraction-static 0.75 \
222
+ --max-running-requests 128 \
223
+ --chunked-prefill-size 16384 \
224
+ --reasoning-parser qwen3 \
225
+ --tool-call-parser mimo \
226
+ --context-length 262144 \
227
+ --attention-backend fa3 \
228
+ --speculative-algorithm EAGLE \
229
+ --speculative-num-steps 3 \
230
+ --speculative-eagle-topk 1 \
231
+ --speculative-num-draft-tokens 4 \
232
+ --enable-mtp
233
+
234
+ # Send request
235
+ curl -i http://localhost:9001/v1/chat/completions \
236
+ -H 'Content-Type:application/json' \
237
+ -d '{
238
+ "messages" : [{
239
+ "role": "user",
240
+ "content": "Nice to meet you MiMo"
241
+ }],
242
+ "model": "mimo-v2-flash",
243
+ "max_tokens": 4096,
244
+ "temperature": 0.8,
245
+ "top_p": 0.95,
246
+ "stream": true,
247
+ "chat_template_kwargs": {
248
+ "enable_thinking": true
249
+ }
250
+ }'
251
+ ```
252
+
253
+ ### Inference with KTransformers (CPU Offloading)
254
+
255
+ [KTransformers](https://github.com/kvcache-ai/ktransformers) enables efficient MiMo-V2-Flash deployment on consumer-grade hardware by offloading MoE expert computations to CPU, built on top of SGLang. With **4× RTX 5090 + 2× AMD EPYC 9355**, it achieves up to **35.7 tokens/s** decode speed.
256
+
257
+ For quick start and benchmarks, visit [KTransformers](https://ktransformers.net/zh/benchmarks#MiMo-V2-Flash-FP8-TP4).
258
+
259
+ ### Notifications
260
+
261
+ #### 1. System prompt
262
+
263
+ > [!IMPORTANT]
264
+ > The following system prompts are **HIGHLY** recommended, please choose from English and Chinese version.
265
+
266
+ English
267
+
268
+ ```plaintext
269
+ You are MiMo, an AI assistant developed by Xiaomi.
270
+
271
+ Today's date: {date} {week}. Your knowledge cutoff date is December 2024.
272
+ ```
273
+
274
+ Chinese
275
+
276
+ ```plaintext
277
+ 你是MiMo(中文名称也是MiMo),是小米公司研发的AI智能助手。
278
+
279
+ 今天的日期:{date} {week},你的知识截止日期是2024年12月。
280
+ ```
281
+
282
+ #### 2. Sampling parameters
283
+
284
+ > [!IMPORTANT]
285
+ > Recommended sampling parameters:
286
+ >
287
+ > `top_p=0.95`
288
+ >
289
+ > `temperature=0.8` for math, writing, web-dev
290
+ >
291
+ > `temperature=0.3` for agentic taks (e.g., vibe-coding, tool-use)
292
+
293
+ #### 3. Tool-use practice
294
+
295
+ > [!IMPORTANT]
296
+ > In the thinking mode with multi-turn tool calls, the model returns a `reasoning_content` field alongside `tool_calls`. To continue the conversation, the user must persist all history `reasoning_content` in the `messages` array of each subsequent request.
297
+
298
+ -----
299
+
300
+ ## 7. Citation
301
+
302
+ If you find our work helpful, please cite our technical report:
303
+
304
+ ```bibtex
305
+ @misc{mimo2025flash,
306
+ title={MiMo-V2-Flash Technical Report},
307
+ author={LLM-Core Xiaomi},
308
+ year={2025},
309
+ url={https://github.com/XiaomiMiMo/MiMo-V2-Flash/paper.pdf}
310
+ }
311
+ ```
312
+
313
+ ## 8. Contact
314
+
315
+ Please contact us at [mimo@xiaomi.com](mailto:mimo@xiaomi.com), join our WeChat group below or open an issue if you have any questions.
316
+
317
+ <p align="center">
318
+ <img src="https://github.com/XiaomiMiMo/MiMo-V2-Flash/raw/main/figures/wechat_group/wechat1.jpg?raw=true" width="20%" />
319
+ <img src="https://github.com/XiaomiMiMo/MiMo-V2-Flash/raw/main/figures/wechat_group/wechat2.jpg?raw=true" width="20%" />
320
+ <img src="https://github.com/XiaomiMiMo/MiMo-V2-Flash/raw/main/figures/wechat_group/wechat3.jpg?raw=true" width="20%" />
321
+ <img src="https://github.com/XiaomiMiMo/MiMo-V2-Flash/raw/main/figures/wechat_group/wechat4.jpg?raw=true" width="20%" />
322
+ </p>
added_tokens.json ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "</think>": 151668,
3
+ "</tool_call>": 151658,
4
+ "</tool_response>": 151666,
5
+ "<think>": 151667,
6
+ "<tool_call>": 151657,
7
+ "<tool_response>": 151665,
8
+ "<|box_end|>": 151649,
9
+ "<|box_start|>": 151648,
10
+ "<|endoftext|>": 151643,
11
+ "<|file_sep|>": 151664,
12
+ "<|fim_middle|>": 151660,
13
+ "<|fim_pad|>": 151662,
14
+ "<|fim_prefix|>": 151659,
15
+ "<|fim_suffix|>": 151661,
16
+ "<|im_end|>": 151645,
17
+ "<|im_start|>": 151644,
18
+ "<|image_pad|>": 151655,
19
+ "<|object_ref_end|>": 151647,
20
+ "<|object_ref_start|>": 151646,
21
+ "<|quad_end|>": 151651,
22
+ "<|quad_start|>": 151650,
23
+ "<|repo_name|>": 151663,
24
+ "<|video_pad|>": 151656,
25
+ "<|vision_end|>": 151653,
26
+ "<|vision_pad|>": 151654,
27
+ "<|vision_start|>": 151652
28
+ }
config.json ADDED
@@ -0,0 +1,213 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "MiMoV2FlashForCausalLM"
4
+ ],
5
+ "auto_map": {
6
+ "AutoConfig": "configuration_mimo_v2_flash.MiMoV2FlashConfig",
7
+ "AutoModel": "modeling_mimo_v2_flash.MiMoV2FlashModel",
8
+ "AutoModelForCausalLM": "modeling_mimo_v2_flash.MiMoV2FlashForCausalLM"
9
+ },
10
+ "quantization_config": {
11
+ "activation_scheme": "dynamic",
12
+ "fmt": "e4m3",
13
+ "packed_modules_mapping": {},
14
+ "quant_method": "fp8",
15
+ "ignored_layers": [
16
+ "model.layers.0.self_attn.o_proj",
17
+ "model.layers.1.self_attn.o_proj",
18
+ "model.layers.2.self_attn.o_proj",
19
+ "model.layers.3.self_attn.o_proj",
20
+ "model.layers.4.self_attn.o_proj",
21
+ "model.layers.5.self_attn.o_proj",
22
+ "model.layers.6.self_attn.o_proj",
23
+ "model.layers.7.self_attn.o_proj",
24
+ "model.layers.8.self_attn.o_proj",
25
+ "model.layers.9.self_attn.o_proj",
26
+ "model.layers.10.self_attn.o_proj",
27
+ "model.layers.11.self_attn.o_proj",
28
+ "model.layers.12.self_attn.o_proj",
29
+ "model.layers.13.self_attn.o_proj",
30
+ "model.layers.14.self_attn.o_proj",
31
+ "model.layers.15.self_attn.o_proj",
32
+ "model.layers.16.self_attn.o_proj",
33
+ "model.layers.17.self_attn.o_proj",
34
+ "model.layers.18.self_attn.o_proj",
35
+ "model.layers.19.self_attn.o_proj",
36
+ "model.layers.20.self_attn.o_proj",
37
+ "model.layers.21.self_attn.o_proj",
38
+ "model.layers.22.self_attn.o_proj",
39
+ "model.layers.23.self_attn.o_proj",
40
+ "model.layers.24.self_attn.o_proj",
41
+ "model.layers.25.self_attn.o_proj",
42
+ "model.layers.26.self_attn.o_proj",
43
+ "model.layers.27.self_attn.o_proj",
44
+ "model.layers.28.self_attn.o_proj",
45
+ "model.layers.29.self_attn.o_proj",
46
+ "model.layers.30.self_attn.o_proj",
47
+ "model.layers.31.self_attn.o_proj",
48
+ "model.layers.32.self_attn.o_proj",
49
+ "model.layers.33.self_attn.o_proj",
50
+ "model.layers.34.self_attn.o_proj",
51
+ "model.layers.35.self_attn.o_proj",
52
+ "model.layers.36.self_attn.o_proj",
53
+ "model.layers.37.self_attn.o_proj",
54
+ "model.layers.38.self_attn.o_proj",
55
+ "model.layers.39.self_attn.o_proj",
56
+ "model.layers.40.self_attn.o_proj",
57
+ "model.layers.41.self_attn.o_proj",
58
+ "model.layers.42.self_attn.o_proj",
59
+ "model.layers.43.self_attn.o_proj",
60
+ "model.layers.44.self_attn.o_proj",
61
+ "model.layers.45.self_attn.o_proj",
62
+ "model.layers.46.self_attn.o_proj",
63
+ "model.layers.47.self_attn.o_proj",
64
+ "model.decoder.self_attn.o_proj"
65
+ ],
66
+ "weight_block_size": [
67
+ 128,
68
+ 128
69
+ ]
70
+ },
71
+ "attention_dropout": 0.0,
72
+ "attention_value_scale": 0.707,
73
+ "hidden_act": "silu",
74
+ "hidden_size": 4096,
75
+ "initializer_range": 0.02,
76
+ "intermediate_size": 16384,
77
+ "max_position_embeddings": 262144,
78
+ "model_type": "mimo_v2_flash",
79
+ "num_attention_heads": 64,
80
+ "head_dim": 192,
81
+ "num_hidden_layers": 48,
82
+ "num_key_value_heads": 4,
83
+ "layernorm_epsilon": 1e-05,
84
+ "rope_theta": 5000000,
85
+ "tie_word_embeddings": false,
86
+ "torch_dtype": "bfloat16",
87
+ "transformers_version": "4.40.1",
88
+ "use_cache": true,
89
+ "vocab_size": 152576,
90
+ "partial_rotary_factor": 0.334,
91
+ "sliding_window": 128,
92
+ "swa_rope_theta": 10000,
93
+ "attention_bias": false,
94
+ "v_head_dim": 128,
95
+ "hybrid_layer_pattern": [
96
+ 0,
97
+ 1,
98
+ 1,
99
+ 1,
100
+ 1,
101
+ 0,
102
+ 1,
103
+ 1,
104
+ 1,
105
+ 1,
106
+ 1,
107
+ 0,
108
+ 1,
109
+ 1,
110
+ 1,
111
+ 1,
112
+ 1,
113
+ 0,
114
+ 1,
115
+ 1,
116
+ 1,
117
+ 1,
118
+ 1,
119
+ 0,
120
+ 1,
121
+ 1,
122
+ 1,
123
+ 1,
124
+ 1,
125
+ 0,
126
+ 1,
127
+ 1,
128
+ 1,
129
+ 1,
130
+ 1,
131
+ 0,
132
+ 1,
133
+ 1,
134
+ 1,
135
+ 1,
136
+ 1,
137
+ 0,
138
+ 1,
139
+ 1,
140
+ 1,
141
+ 1,
142
+ 1,
143
+ 0
144
+ ],
145
+ "add_swa_attention_sink_bias": true,
146
+ "add_full_attention_sink_bias": false,
147
+ "sliding_window_size": 128,
148
+ "attention_chunk_size": 128,
149
+ "moe_layer_freq": [
150
+ 0,
151
+ 1,
152
+ 1,
153
+ 1,
154
+ 1,
155
+ 1,
156
+ 1,
157
+ 1,
158
+ 1,
159
+ 1,
160
+ 1,
161
+ 1,
162
+ 1,
163
+ 1,
164
+ 1,
165
+ 1,
166
+ 1,
167
+ 1,
168
+ 1,
169
+ 1,
170
+ 1,
171
+ 1,
172
+ 1,
173
+ 1,
174
+ 1,
175
+ 1,
176
+ 1,
177
+ 1,
178
+ 1,
179
+ 1,
180
+ 1,
181
+ 1,
182
+ 1,
183
+ 1,
184
+ 1,
185
+ 1,
186
+ 1,
187
+ 1,
188
+ 1,
189
+ 1,
190
+ 1,
191
+ 1,
192
+ 1,
193
+ 1,
194
+ 1,
195
+ 1,
196
+ 1,
197
+ 1
198
+ ],
199
+ "moe_intermediate_size": 2048,
200
+ "n_routed_experts": 256,
201
+ "n_shared_experts": null,
202
+ "num_experts_per_tok": 8,
203
+ "norm_topk_prob": true,
204
+ "scoring_func": "sigmoid",
205
+ "n_group": 1,
206
+ "topk_group": 1,
207
+ "topk_method": "noaux_tc",
208
+ "routed_scaling_factor": null,
209
+ "swa_num_attention_heads": 64,
210
+ "swa_num_key_value_heads": 8,
211
+ "swa_head_dim": 192,
212
+ "swa_v_head_dim": 128
213
+ }
configuration_mimo_v2_flash.py ADDED
@@ -0,0 +1,109 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # coding=utf-8
2
+ #
3
+ # Copyright 2025 Xiaomi Corporation.
4
+ # Copyright 2025 The HuggingFace Inc. team.
5
+ #
6
+ # Licensed under the Apache License, Version 2.0 (the "License");
7
+ # you may not use this file except in compliance with the License.
8
+ # You may obtain a copy of the License at
9
+ #
10
+ # http://www.apache.org/licenses/LICENSE-2.0
11
+ #
12
+ # Unless required by applicable law or agreed to in writing, software
13
+ # distributed under the License is distributed on an "AS IS" BASIS,
14
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
15
+ # See the License for the specific language governing permissions and
16
+ # limitations under the License.
17
+
18
+ from transformers.configuration_utils import PretrainedConfig
19
+ from transformers.modeling_rope_utils import rope_config_validation
20
+ from transformers.utils import logging
21
+
22
+
23
+ logger = logging.get_logger(__name__)
24
+
25
+
26
+ class MiMoV2FlashConfig(PretrainedConfig):
27
+
28
+ model_type = ""
29
+ keys_to_ignore_at_inference = ["past_key_values"]
30
+
31
+ # Default tensor parallel plan for base model `Hybrid`
32
+ base_model_tp_plan = {
33
+ "layers.*.self_attn.q_proj": "colwise",
34
+ "layers.*.self_attn.k_proj": "colwise",
35
+ "layers.*.self_attn.v_proj": "colwise",
36
+ "layers.*.self_attn.o_proj": "rowwise",
37
+ "layers.*.mlp.gate_proj": "colwise",
38
+ "layers.*.mlp.up_proj": "colwise",
39
+ "layers.*.mlp.down_proj": "rowwise",
40
+ }
41
+ base_model_pp_plan = {
42
+ "embed_tokens": (["input_ids"], ["inputs_embeds"]),
43
+ "layers": (["hidden_states", "attention_mask"], ["hidden_states"]),
44
+ "norm": (["hidden_states"], ["hidden_states"]),
45
+ }
46
+
47
+ attribute_map = {
48
+ "num_local_experts": "n_routed_experts",
49
+ }
50
+
51
+ def __init__(
52
+ self,
53
+ vocab_size=151936,
54
+ hidden_size=4096,
55
+ intermediate_size=22016,
56
+ num_hidden_layers=32,
57
+ num_attention_heads=32,
58
+ num_key_value_heads=32,
59
+ hidden_act="silu",
60
+ max_position_embeddings=32768,
61
+ initializer_range=0.02,
62
+ layernorm_epsilon=1e-6,
63
+ use_cache=True,
64
+ tie_word_embeddings=False,
65
+ rope_theta=10000.0,
66
+ rope_scaling=None,
67
+ attention_dropout=0.0,
68
+ hybrid_block_size=None,
69
+ hybrid_layer_pattern=None,
70
+ partial_rotary_factor=1.0,
71
+ **kwargs,
72
+ ):
73
+ self.vocab_size = vocab_size
74
+ self.max_position_embeddings = max_position_embeddings
75
+ self.hidden_size = hidden_size
76
+ self.intermediate_size = intermediate_size
77
+ self.num_hidden_layers = num_hidden_layers
78
+ self.num_attention_heads = num_attention_heads
79
+
80
+ # for backward compatibility
81
+ if num_key_value_heads is None:
82
+ num_key_value_heads = num_attention_heads
83
+
84
+ self.num_key_value_heads = num_key_value_heads
85
+ self.hidden_act = hidden_act
86
+ self.initializer_range = initializer_range
87
+ self.layernorm_epsilon = layernorm_epsilon
88
+ self.use_cache = use_cache
89
+ self.rope_theta = rope_theta
90
+ self.rope_scaling = rope_scaling
91
+ self.attention_dropout = attention_dropout
92
+
93
+ if hybrid_block_size is not None and hybrid_layer_pattern is None:
94
+ hybrid_layer_pattern = [0 if ((i + 1) % hybrid_block_size == 0) else 1 for i in range(num_hidden_layers)]
95
+ self.hybrid_block_size = hybrid_block_size
96
+ self.hybrid_layer_pattern = hybrid_layer_pattern
97
+
98
+ self.partial_rotary_factor = partial_rotary_factor
99
+
100
+ # Validate the correctness of rotary position embeddings parameters
101
+ # BC: if there is a 'type' field, move it to 'rope_type'.
102
+ if self.rope_scaling is not None and "type" in self.rope_scaling:
103
+ self.rope_scaling["rope_type"] = self.rope_scaling["type"]
104
+ rope_config_validation(self)
105
+
106
+ super().__init__(
107
+ tie_word_embeddings=tie_word_embeddings,
108
+ **kwargs,
109
+ )
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
model.safetensors.index.json ADDED
The diff for this file is too large to render. See raw diff
 
model_0.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a38e69fd84e5dbeb007a1e999bc186cf2ee5ab4d380a2255662e9dfe62ac3c2c
3
+ size 324091032
model_1.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ac9b00d805466265c6cc7208958532d2819c3d6b73e8a551cbe71f1196ef6675
3
+ size 132154312
model_10.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:15a76d7cd96b8f855072b0b9b2eb2ef323f45605eab9942d1962f3b189d9ae38
3
+ size 132154328
model_10_linear_fc1.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b79a6754a6ceffd3b1164847e6371fe94f74cd44b9f8a177976dff8fb25f6ada
3
+ size 4296144144
model_10_linear_fc2.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a103d7e8de4de8af339e4c0bf2f5595e86901e78586acf66278067033bcf8005
3
+ size 2148072376
model_11.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d34f5fc039a11df7686fed6a46f6f43e5241a49dd8e4df1b959ae3b512d889c5
3
+ size 126910184
model_11_linear_fc1.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c73f4786cc52b659e8e8c0693b8f1c70d14f6982618136b9b797c3f0a52bb5a5
3
+ size 4296144144
model_11_linear_fc2.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4dab8f4037d40cd5389e1e10d7f7cd68e3c46e12a84bcb82ea8c5281457d226d
3
+ size 2148072376
model_12.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bb75be363c969bc2487049d654b47418f453f8080a89eb510be9d5bd57c9620b
3
+ size 132154328
model_12_linear_fc1.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:37471f00a836b669b73cdb4ae7445cfb4f9ceb9b29af8869eb8f872e2a1aa780
3
+ size 4296144144
model_12_linear_fc2.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c5e4e58c71850df727cf05d9d6ca8897f2f2dd379bfe76d401eea957b95437df
3
+ size 2148072376
model_13.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:744ba55216e1d1f8651770266130e3a1d4d62f06fbc076c6f739c8279c7274ed
3
+ size 132154328
model_13_linear_fc1.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4822f5647729e1d6056f7947c8369f798a437dd214272e89913666646c6a96b2
3
+ size 4296144144
model_13_linear_fc2.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9b2c3678c75f8d2a677db292a0c7dd8f2480c8d0ebe52ef45370f87dd047dc19
3
+ size 2148072376
model_14.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:44c395ec24de044119c8ebe9e361e5e40323feb8e57607386c0369522a343432
3
+ size 132154328
model_14_linear_fc1.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a52d904947cd7c257e3d83ddb70d6afdbb50eb7cf777331f27fd5b34ffe0b1eb
3
+ size 4296144144
model_14_linear_fc2.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f5c2b2dcecae4df308f2f41e2b79813414de2736012e147897baad21a5a57960
3
+ size 2148072376
model_15.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9dcccb45364d69afab25fb0837befdb80a13257a7491bdd3ba6b83b3c5a1555d
3
+ size 132154328
model_15_linear_fc1.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e44d6af2fe49e8c00c207b6e5ef5cd2078413b0dde93571f7c67d65ed8129393
3
+ size 4296144144
model_15_linear_fc2.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0ec6cc6353c6d9b0553b2363845a9c3153a8157edbe1d4c51b8f10c69fc26b7e
3
+ size 2148072376
model_16.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:61e250132d8f4f753d1ee7d5cbffce109bac0e419685757781417d500d0bcc87
3
+ size 132154328
model_16_linear_fc1.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5103440d890826c0bd1ca685e5447633a4a6e95668766b35b5898aa59f30b5a0
3
+ size 4296144144
model_16_linear_fc2.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:554bf20e506b7ccd94e277375e2632868a19d8d7186d4cb565a5118751e3d409
3
+ size 2148072376
model_17.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3cbfd9461946facfe56dfdf14222aa8a4a3d8b19ef4c32b1c814f1b42cbee113
3
+ size 126910184
model_17_linear_fc1.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:db89055cc8a6649d8d36da0541558c272c87cc8fdbe88dc01767d85b4c99d41a
3
+ size 4296144144
model_17_linear_fc2.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2340276698490a892d23223513e58dc476018d6e8681f07665860ec8f1c78e98
3
+ size 2148072376
model_18.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a277cf00ce1bc93147ca00f4f5fe09a72ac9ed27973e9a87960494d6ef90908a
3
+ size 132154328
model_18_linear_fc1.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:66dc68b04fb2c373ec9e01ba385bf454eee731e050b4f9990b78ec3292a8b366
3
+ size 4296144144
model_18_linear_fc2.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cbaeb46c35c2a22594c14b11264a2a91c93d1be3f8247336321a558309150d03
3
+ size 2148072376
model_19.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9b6dc3861aa3176eda4cae4b82cb4a347bfb2bbddad7245202f6eff5ee89e7c9
3
+ size 132154328
model_19_linear_fc1.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:edccbc22110f574a7c7510f11d092877b73292dd1394e91aab2d7c77bf8eec81
3
+ size 4296144144
model_19_linear_fc2.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f94702133c7733209ee97d077053e368ba6e218bab7ae243538ecff6b37ee2a5
3
+ size 2148072376
model_1_linear_fc1.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:de58f252388fc33c62a3cc709d98996d0d56c9046068f22aa6e9d7861294e579
3
+ size 4296143120
model_1_linear_fc2.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0614837e791547d06388dc9395913f93fb6d188dbc11800b6bbf62ca0fb4ba09
3
+ size 2148071864
model_2.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:63ea731b8fe60181264e89e23b6f7ae43616353b2ceb843a9194806b424c7fcf
3
+ size 132154312
model_20.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dd223a5ce5bcc4cc542314eb435cc6dc4b366a7c5411e471ffaad21f6ac7b5b7
3
+ size 132154328
model_20_linear_fc1.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e98dab38464b91238ebbc7dcb720a879f466b1d9eaadb817b31b123df0e8ef46
3
+ size 4296144144
model_20_linear_fc2.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:02f8a73128df2222145285bfcb4c77544ddabcb94dbe403d8dcfa5c817329b5f
3
+ size 2148072376
model_21.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6920b442fa917d285ef417cb5ae4d09d8b716412e04c31e128bd717957b62bda
3
+ size 132154328
model_21_linear_fc1.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fb5c8084d033a24f337305341e1de627e3dfd164225d04c7b3de6fb668e2bc6a
3
+ size 4296144144
model_21_linear_fc2.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d3cb86f8ebd086fdc2c055a68f70c7b9855e0eb9abce5f6fd8b5df87ccc2dc3a
3
+ size 2148072376
model_22.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:97cc9a1d9da89630968e326e20c28da8fa8a662830b26479df5b893328fc89d2
3
+ size 132154328
model_22_linear_fc1.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:01a158fa9c5fa59a4891f3f81afe4344d032002adef9eb50237c676451fd5a05
3
+ size 4296144144