bwshen-mi commited on
Commit
791db75
Β·
verified Β·
1 Parent(s): 75dbb4d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +52 -52
README.md CHANGED
@@ -1,52 +1,52 @@
1
- ---
2
- license: mit
3
- library_name: transformers
4
- ---
5
-
6
- <div align="center">
7
- <picture>
8
- <source srcset="https://github.com/XiaomiMiMo/MiMo/raw/main/figures/Xiaomi_MiMo_darkmode.png?raw=true" media="(prefers-color-scheme: dark)">
9
- <img src="https://github.com/XiaomiMiMo/MiMo/raw/main/figures/Xiaomi_MiMo.png?raw=true" width="60%" alt="Xiaomi-MiMo" />
10
- </picture>
11
- </div>
12
-
13
- <h3 align="center">
14
- <b>
15
- <span>━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━</span>
16
- <br/>
17
- Unlocking the Reasoning Potential of Language Model<br/>From Pretraining to Posttraining
18
- <br/>
19
- <span>━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━</span>
20
- <br/>
21
- </b>
22
- </h3>
23
-
24
- <br/>
25
-
26
- <div align="center" style="line-height: 1;">
27
- |
28
- <a href="https://huggingface.co/XiaomiMiMo" target="_blank">πŸ€— HuggingFace</a>
29
- &nbsp;|
30
- <a href="https://www.modelscope.cn/organization/XiaomiMiMo" target="_blank">πŸ€–οΈ ModelScope</a>
31
- &nbsp;|
32
- <a href="https://arxiv.org/abs/2505.07608" target="_blank">πŸ“” Technical Report</a>
33
- &nbsp;|
34
- <br/>
35
- </div>
36
-
37
- <br/>
38
-
39
- > This model repository is licensed under the MIT License.
40
-
41
- ## I. Pretrained MTPs of MiMo-7B
42
-
43
- This model repository contains the pretrained MTP weights of MiMo-7B (`model.mtp_layers.1` and `model.mtp_layers.2`)
44
-
45
- Currently, MiMo-7B model each has 1 MTP layer (`model.mtp_layers.0`). Users may load the weights of pretrained MTPs for potential performance gains (please refer to *[Power Up Speculative Decoding In Reinforcement Learning](https://www.notion.so/jiajunli-guapisolo/Power-Up-Speculative-Decoding-In-Reinforcement-Learning-2a92d24a293b802d9c73dbae429e581e)*).
46
-
47
- > [!IMPORTANT]
48
- > We tuned 1 MTP layer in SFT and freeze it in RL, and we **HAVE NOT** test the performance of posttrained models with 2 more pretrained MTP layers.
49
-
50
- ## II. Contact
51
-
52
- Please contact us at [mimo@xiaomi.com](mailto:mimo@xiaomi.com) or open an issue if you have any questions.
 
1
+ ---
2
+ license: mit
3
+ library_name: transformers
4
+ ---
5
+
6
+ <div align="center">
7
+ <picture>
8
+ <source srcset="https://github.com/XiaomiMiMo/MiMo/raw/main/figures/Xiaomi_MiMo_darkmode.png?raw=true" media="(prefers-color-scheme: dark)">
9
+ <img src="https://github.com/XiaomiMiMo/MiMo/raw/main/figures/Xiaomi_MiMo.png?raw=true" width="60%" alt="Xiaomi-MiMo" />
10
+ </picture>
11
+ </div>
12
+
13
+ <h3 align="center">
14
+ <b>
15
+ <span>━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━</span>
16
+ <br/>
17
+ Unlocking the Reasoning Potential of Language Model<br/>From Pretraining to Posttraining
18
+ <br/>
19
+ <span>━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━</span>
20
+ <br/>
21
+ </b>
22
+ </h3>
23
+
24
+ <br/>
25
+
26
+ <div align="center" style="line-height: 1;">
27
+ |
28
+ <a href="https://huggingface.co/XiaomiMiMo" target="_blank">πŸ€— HuggingFace</a>
29
+ &nbsp;|
30
+ <a href="https://www.modelscope.cn/organization/XiaomiMiMo" target="_blank">πŸ€–οΈ ModelScope</a>
31
+ &nbsp;|
32
+ <a href="https://arxiv.org/abs/2505.07608" target="_blank">πŸ“” Technical Report</a>
33
+ &nbsp;|
34
+ <br/>
35
+ </div>
36
+
37
+ <br/>
38
+
39
+ > This model repository is licensed under the MIT License.
40
+
41
+ ## I. Pretrained MTPs of MiMo-7B
42
+
43
+ This model repository contains the pretrained MTP weights of MiMo-7B (`model.mtp_layers.1` and `model.mtp_layers.2`)
44
+
45
+ Currently, MiMo-7B model each has 1 MTP layer (`model.mtp_layers.0`). Users may load the weights of pretrained MTPs for potential rollout speedup (please refer to *[Power Up Speculative Decoding In Reinforcement Learning](https://www.notion.so/jiajunli-guapisolo/Power-Up-Speculative-Decoding-In-Reinforcement-Learning-2a92d24a293b802d9c73dbae429e581e)*).
46
+
47
+ > [!IMPORTANT]
48
+ > We tuned 1 MTP layer in SFT and freeze it in RL, and we **HAVE NOT** test the performance of posttrained models with 2 more pretrained MTP layers.
49
+
50
+ ## II. Contact
51
+
52
+ Please contact us at [mimo@xiaomi.com](mailto:mimo@xiaomi.com) or open an issue if you have any questions.