XiaomiMiMo
/

MiMo-7B-MTPs

@@ -1,52 +1,52 @@
----
-license: mit
-library_name: transformers
----
-<div align="center">
-  <picture>
-    <source srcset="https://github.com/XiaomiMiMo/MiMo/raw/main/figures/Xiaomi_MiMo_darkmode.png?raw=true" media="(prefers-color-scheme: dark)">
-    <img src="https://github.com/XiaomiMiMo/MiMo/raw/main/figures/Xiaomi_MiMo.png?raw=true" width="60%" alt="Xiaomi-MiMo" />
-  </picture>
-</div>
-<h3 align="center">
-  <b>
-    <span>━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━</span>
-    <br/>
-    Unlocking the Reasoning Potential of Language Model<br/>From Pretraining to Posttraining
-    <br/>
-    <span>━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━</span>
-    <br/>
-  </b>
-</h3>
-<br/>
-<div align="center" style="line-height: 1;">
-  |
-  <a href="https://huggingface.co/XiaomiMiMo" target="_blank">🤗 HuggingFace</a>
-  &nbsp;|
-  <a href="https://www.modelscope.cn/organization/XiaomiMiMo" target="_blank">🤖️ ModelScope</a>
-  &nbsp;|
-  <a href="https://arxiv.org/abs/2505.07608" target="_blank">📔 Technical Report</a>
-  &nbsp;|
-  <br/>
-</div>
-<br/>
-> This model repository is licensed under the MIT License.
-## I. Pretrained MTPs of MiMo-7B
-This model repository contains the pretrained MTP weights of MiMo-7B (`model.mtp_layers.1` and `model.mtp_layers.2`)
-Currently, MiMo-7B model each has 1 MTP layer (`model.mtp_layers.0`). Users may load the weights of pretrained MTPs for potential performance gains (please refer to *[Power Up Speculative Decoding In Reinforcement Learning](https://www.notion.so/jiajunli-guapisolo/Power-Up-Speculative-Decoding-In-Reinforcement-Learning-2a92d24a293b802d9c73dbae429e581e)*).
-> [!IMPORTANT]
-> We tuned 1 MTP layer in SFT and freeze it in RL, and we **HAVE NOT** test the performance of posttrained models with 2 more pretrained MTP layers.
-## II. Contact
-Please contact us at [mimo@xiaomi.com](mailto:mimo@xiaomi.com) or open an issue if you have any questions.

+---
+license: mit
+library_name: transformers
+---
+<div align="center">
+  <picture>
+    <source srcset="https://github.com/XiaomiMiMo/MiMo/raw/main/figures/Xiaomi_MiMo_darkmode.png?raw=true" media="(prefers-color-scheme: dark)">
+    <img src="https://github.com/XiaomiMiMo/MiMo/raw/main/figures/Xiaomi_MiMo.png?raw=true" width="60%" alt="Xiaomi-MiMo" />
+  </picture>
+</div>
+<h3 align="center">
+  <b>
+    <span>━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━</span>
+    <br/>
+    Unlocking the Reasoning Potential of Language Model<br/>From Pretraining to Posttraining
+    <br/>
+    <span>━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━</span>
+    <br/>
+  </b>
+</h3>
+<br/>
+<div align="center" style="line-height: 1;">
+  |
+  <a href="https://huggingface.co/XiaomiMiMo" target="_blank">🤗 HuggingFace</a>
+  &nbsp;|
+  <a href="https://www.modelscope.cn/organization/XiaomiMiMo" target="_blank">🤖️ ModelScope</a>
+  &nbsp;|
+  <a href="https://arxiv.org/abs/2505.07608" target="_blank">📔 Technical Report</a>
+  &nbsp;|
+  <br/>
+</div>
+<br/>
+> This model repository is licensed under the MIT License.
+## I. Pretrained MTPs of MiMo-7B
+This model repository contains the pretrained MTP weights of MiMo-7B (`model.mtp_layers.1` and `model.mtp_layers.2`)
+Currently, MiMo-7B model each has 1 MTP layer (`model.mtp_layers.0`). Users may load the weights of pretrained MTPs for potential rollout speedup (please refer to *[Power Up Speculative Decoding In Reinforcement Learning](https://www.notion.so/jiajunli-guapisolo/Power-Up-Speculative-Decoding-In-Reinforcement-Learning-2a92d24a293b802d9c73dbae429e581e)*).
+> [!IMPORTANT]
+> We tuned 1 MTP layer in SFT and freeze it in RL, and we **HAVE NOT** test the performance of posttrained models with 2 more pretrained MTP layers.
+## II. Contact
+Please contact us at [mimo@xiaomi.com](mailto:mimo@xiaomi.com) or open an issue if you have any questions.