Instructions to use unsloth/MiMo-V2-Flash with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use unsloth/MiMo-V2-Flash with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="unsloth/MiMo-V2-Flash", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("unsloth/MiMo-V2-Flash", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use unsloth/MiMo-V2-Flash with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "unsloth/MiMo-V2-Flash"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "unsloth/MiMo-V2-Flash",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/unsloth/MiMo-V2-Flash

SGLang

How to use unsloth/MiMo-V2-Flash with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "unsloth/MiMo-V2-Flash" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "unsloth/MiMo-V2-Flash",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "unsloth/MiMo-V2-Flash" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "unsloth/MiMo-V2-Flash",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Unsloth Studio

How to use unsloth/MiMo-V2-Flash with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for unsloth/MiMo-V2-Flash to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for unsloth/MiMo-V2-Flash to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for unsloth/MiMo-V2-Flash to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="unsloth/MiMo-V2-Flash",
    max_seq_length=2048,
)

Docker Model Runner
How to use unsloth/MiMo-V2-Flash with Docker Model Runner:
```
docker model run hf.co/unsloth/MiMo-V2-Flash
```

danielhanchen commited on Dec 28, 2025

Commit

9990294

verified ·

1 Parent(s): b25efaf

Upload folder using huggingface_hub

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

.gitattributes +1 -0
README.md +341 -0
added_tokens.json +28 -0
chat_template.jinja +145 -0
config.json +216 -0
configuration_mimo_v2_flash.py +109 -0
merges.txt +0 -0
model.safetensors.index.json +0 -0
model_0.safetensors +3 -0
model_1.safetensors +3 -0
model_10.safetensors +3 -0
model_10_linear_fc1.safetensors +3 -0
model_10_linear_fc2.safetensors +3 -0
model_11.safetensors +3 -0
model_11_linear_fc1.safetensors +3 -0
model_11_linear_fc2.safetensors +3 -0
model_12.safetensors +3 -0
model_12_linear_fc1.safetensors +3 -0
model_12_linear_fc2.safetensors +3 -0
model_13.safetensors +3 -0
model_13_linear_fc1.safetensors +3 -0
model_13_linear_fc2.safetensors +3 -0
model_14.safetensors +3 -0
model_14_linear_fc1.safetensors +3 -0
model_14_linear_fc2.safetensors +3 -0
model_15.safetensors +3 -0
model_15_linear_fc1.safetensors +3 -0
model_15_linear_fc2.safetensors +3 -0
model_16.safetensors +3 -0
model_16_linear_fc1.safetensors +3 -0
model_16_linear_fc2.safetensors +3 -0
model_17.safetensors +3 -0
model_17_linear_fc1.safetensors +3 -0
model_17_linear_fc2.safetensors +3 -0
model_18.safetensors +3 -0
model_18_linear_fc1.safetensors +3 -0
model_18_linear_fc2.safetensors +3 -0
model_19.safetensors +3 -0
model_19_linear_fc1.safetensors +3 -0
model_19_linear_fc2.safetensors +3 -0
model_1_linear_fc1.safetensors +3 -0
model_1_linear_fc2.safetensors +3 -0
model_2.safetensors +3 -0
model_20.safetensors +3 -0
model_20_linear_fc1.safetensors +3 -0
model_20_linear_fc2.safetensors +3 -0
model_21.safetensors +3 -0
model_21_linear_fc1.safetensors +3 -0
model_21_linear_fc2.safetensors +3 -0
model_22.safetensors +3 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+tokenizer.json filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,341 @@

+---
+tags:
+- unsloth
+base_model:
+- XiaomiMiMo/MiMo-V2-Flash
+license: mit
+library_name: transformers
+---
+> [!NOTE]
+>  Includes Unsloth **chat template fixes**! <br> For `llama.cpp`, use `--jinja`
+>
+<div>
+<p style="margin-top: 0;margin-bottom: 0;">
+    <em><a href="https://docs.unsloth.ai/basics/unsloth-dynamic-v2.0-gguf">Unsloth Dynamic 2.0</a> achieves superior accuracy & outperforms other leading quants.</em>
+  </p>
+  <div style="display: flex; gap: 5px; align-items: center; ">
+    <a href="https://github.com/unslothai/unsloth/">
+      <img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="133">
+    </a>
+    <a href="https://discord.gg/unsloth">
+      <img src="https://github.com/unslothai/unsloth/raw/main/images/Discord%20button.png" width="173">
+    </a>
+    <a href="https://docs.unsloth.ai/">
+      <img src="https://raw.githubusercontent.com/unslothai/unsloth/refs/heads/main/images/documentation%20green%20button.png" width="143">
+    </a>
+  </div>
+</div>
+<br/><br/>
+<div align="center">
+  <picture>
+    <source srcset="https://github.com/XiaomiMiMo/MiMo-V2-Flash/raw/main/figures/Xiaomi_MiMo_darkmode.png?raw=true" media="(prefers-color-scheme: dark)">
+    <img src="https://github.com/XiaomiMiMo/MiMo-V2-Flash/raw/main/figures/Xiaomi_MiMo.png?raw=true" width="60%" alt="Xiaomi-MiMo" />
+  </picture>
+</div>
+<br/>
+<div align="center" style="line-height: 1;">
+  |
+  <a href="https://huggingface.co/XiaomiMiMo/MiMo-V2-Flash" target="_blank">🤗 HuggingFace</a>
+  &nbsp;|
+  <a href="https://github.com/XiaomiMiMo/MiMo-V2-Flash/blob/main/paper.pdf" target="_blank">📔 Technical Report </a>
+  &nbsp;|
+  <a href="https://mimo.xiaomi.com/blog/mimo-v2-flash" target="_blank">📰 Blog </a>
+  &nbsp;|
+  <br/><br/>
+  <strong>Play around!</strong> &nbsp;
+  <a href="https://aistudio.xiaomimimo.com" target="_blank">🗨️ Xiaomi MiMo Studio </a>
+  &nbsp;
+  <a href="https://platform.xiaomimimo.com/" target="_blank">🎨 Xiaomi MiMo API Platform </a>
+</div>
+<br/>
+# MiMo-V2-Flash
+**MiMo-V2-Flash** is a Mixture-of-Experts (MoE) language model with **309B total parameters** and **15B active parameters**. Designed for high-speed reasoning and agentic workflows, it utilizes a novel hybrid attention architecture and Multi-Token Prediction (MTP) to achieve state-of-the-art performance while significantly reducing inference costs.
+<p align="center">
+  <img width="80%" src="https://github.com/XiaomiMiMo/MiMo-V2-Flash/raw/main/figures/MiMo-v2-flash-performance.jpg?raw=true">
+</p>
+-----
+## 1. Introduction
+MiMo-V2-Flash creates a new balance between long-context modeling capability and inference efficiency. Key features include:
+  * **Hybrid Attention Architecture**: Interleaves Sliding Window Attention (SWA) and Global Attention (GA) with a 5:1 ratio and an aggressive 128-token window. This reduces KV-cache storage by nearly 6x while maintaining long-context performance via learnable **attention sink bias**.
+  * **Multi-Token Prediction (MTP)**: Equipped with a lightweight MTP module (0.33B params/block) using dense FFNs. This triples output speed during inference and will be good to accelerates rollout in RL training.
+  * **Efficient Pre-Training**: Trained on 27T tokens using FP8 mixed precision and native 32k seq length. The context window supports up to 256k length.
+  * **Agentic Capabilities**: Post-training utilizes Multi-Teacher On-Policy Distillation (MOPD) and large-scale agentic RL, achieving superior performance on **SWE-Bench** and complex reasoning tasks.
+-----
+## 2. Model Downloads
+| Model                  | Total Params | Active Params | Context Length |                               Download                                |
+| :--------------------- | :----------: | :-----------: | :------------: | :-------------------------------------------------------------------: |
+| **MiMo-V2-Flash-Base** |     309B     |      15B      |      256k      | [🤗 HuggingFace](https://huggingface.co/XiaomiMiMo/MiMo-V2-Flash-Base) |
+| **MiMo-V2-Flash**      |     309B     |      15B      |      256k      |   [🤗 HuggingFace](https://huggingface.co/XiaomiMiMo/MiMo-V2-Flash)    |
+> [!IMPORTANT]
+> We also open-source the 3-layer MTP weights to foster community research.
+-----
+## 3. Evaluation Results
+### Base Model Evaluation
+MiMo-V2-Flash-Base demonstrates strong performance across standard benchmarks, surpassing models with significantly larger parameter counts.
+| Category         | Benchmark               | Setting/Length | MiMo-V2-Flash Base |  Kimi-K2 Base   | DeepSeek-V3.1 Base | DeepSeek-V3.2 Exp Base |
+| :--------------- | :---------------------- | :------------- | :----------------: | :-------------: | :----------------: | :--------------------: |
+| **Params**       | **#Activated / #Total** | -              |   **15B / 309B**   | **32B / 1043B** |   **37B / 671B**   |     **37B / 671B**     |
+| **General**      | BBH                     | 3-shot         |        88.5        |      88.7       |        88.2        |          88.7          |
+|                  | MMLU                    | 5-shot         |        86.7        |      87.8       |        87.4        |          87.8          |
+|                  | MMLU-Redux              | 5-shot         |        90.6        |      90.2       |        90.0        |          90.4          |
+|                  | MMLU-Pro                | 5-shot         |        73.2        |      69.2       |        58.8        |          62.1          |
+|                  | DROP                    | 3-shot         |        84.7        |      83.6       |        86.3        |          86.6          |
+|                  | ARC-Challenge           | 25-shot        |        95.9        |      96.2       |        95.6        |          95.5          |
+|                  | HellaSwag               | 10-shot        |        88.5        |      94.6       |        89.2        |          89.4          |
+|                  | WinoGrande              | 5-shot         |        83.8        |      85.3       |        85.9        |          85.6          |
+|                  | TriviaQA                | 5-shot         |        80.3        |      85.1       |        83.5        |          83.9          |
+|                  | GPQA-Diamond            | 5-shot         |        55.1        |      48.1       |        51.0        |          52.0          |
+|                  | SuperGPQA               | 5-shot         |        41.1        |      44.7       |        42.3        |          43.6          |
+|                  | SimpleQA                | 5-shot         |        20.6        |      35.3       |        26.3        |          27.0          |
+| **Math**         | GSM8K                   | 8-shot         |        92.3        |      92.1       |        91.4        |          91.1          |
+|                  | MATH                    | 4-shot         |        71.0        |      70.2       |        62.6        |          62.5          |
+|                  | AIME 24&25              | 2-shot         |        35.3        |      31.6       |        21.6        |          24.8          |
+| **Code**         | HumanEval+              | 1-shot         |        70.7        |      84.8       |        64.6        |          67.7          |
+|                  | MBPP+                   | 3-shot         |        71.4        |      73.8       |        72.2        |          69.8          |
+|                  | CRUXEval-I              | 1-shot         |        67.5        |      74.0       |        62.1        |          63.9          |
+|                  | CRUXEval-O              | 1-shot         |        79.1        |      83.5       |        76.4        |          74.9          |
+|                  | MultiPL-E HumanEval     | 0-shot         |        59.5        |      60.5       |        45.9        |          45.7          |
+|                  | MultiPL-E MBPP          | 0-shot         |        56.7        |      58.8       |        52.5        |          50.6          |
+|                  | BigCodeBench            | 0-shot         |        70.1        |      61.7       |        63.0        |          62.9          |
+|                  | LiveCodeBench v6        | 1-shot         |        30.8        |      26.3       |        24.8        |          24.9          |
+|                  | SWE-Bench (AgentLess)   | 3-shot         |        30.8        |      28.2       |        24.8        |          9.4*          |
+| **Chinese**      | C-Eval                  | 5-shot         |        87.9        |      92.5       |        90.0        |          91.0          |
+|                  | CMMLU                   | 5-shot         |        87.4        |      90.9       |        88.8        |          88.9          |
+|                  | C-SimpleQA              | 5-shot         |        61.5        |      77.6       |        70.9        |          68.0          |
+| **Multilingual** | GlobalMMLU              | 5-shot         |        76.6        |      80.7       |        81.9        |          82.0          |
+|                  | INCLUDE                 | 5-shot         |        71.4        |      75.3       |        77.2        |          77.2          |
+| **Long Context** | NIAH-Multi              | 32K            |        99.3        |      99.8       |        99.7        |         85.6*          |
+|                  |                         | 64K            |        99.9        |      100.0      |        98.6        |         85.9*          |
+|                  |                         | 128K           |        98.6        |      99.5       |        97.2        |         94.3*          |
+|                  |                         | 256K           |        96.7        |        -        |         -          |           -            |
+|                  | GSM-Infinite Hard       | 16K            |        37.7        |      34.6       |        41.5        |          50.4          |
+|                  |                         | 32K            |        33.7        |      26.1       |        38.8        |          45.2          |
+|                  |                         | 64K            |        31.5        |      16.0       |        34.7        |          32.6          |
+|                  |                         | 128K           |        29.0        |       8.8       |        28.7        |          25.7          |
+> \* indicates the model may fail to follow the prompt or format.
+### Post-training Model Evaluation
+Following our Post-Training Paradigm with MOPD and Agentic RL, the model achieves SOTA reasoning and agentic performance.
+| Benchmark                      | MiMo-V2 Flash | Kimi-K2 Thinking | DeepSeek-V3.2 Thinking | Gemini-3.0 Pro | Claude Sonnet 4.5 | GPT-5 High |
+| :----------------------------- | :-----------: | :--------------: | :--------------------: | :------------: | :---------------: | :--------: |
+| **Reasoning**                  |               |                  |                        |                |                   |            |
+| MMLU-Pro                       |     84.9      |       84.6       |          85.0          |      90.1      |       88.2        |    87.5    |
+| GPQA-Diamond                   |     83.7      |       84.5       |          82.4          |      91.9      |       83.4        |    85.7    |
+| HLE (no tools)                 |     22.1      |       23.9       |          25.1          |      37.5      |       13.7        |    26.3    |
+| AIME 2025                      |     94.1      |       94.5       |          93.1          |      95.0      |       87.0        |    94.6    |
+| HMMT Feb. 2025                 |     84.4      |       89.4       |          92.5          |      97.5      |       79.2        |    88.3    |
+| LiveCodeBench-v6               |     80.6      |       83.1       |          83.3          |      90.7      |       64.0        |    84.5    |
+| **General Writing**            |               |                  |                        |                |                   |            |
+| Arena-Hard (Hard Prompt)       |     54.1      |       71.9       |          53.4          |      72.6      |       63.3        |    71.9    |
+| Arena-Hard (Creative Writing)  |     86.2      |       80.1       |          88.8          |      93.6      |       76.7        |    92.2    |
+| **Long Context**               |               |                  |                        |                |                   |            |
+| LongBench V2                   |     60.6      |       45.1       |          58.4          |      65.6      |       61.8        |     -      |
+| MRCR                           |     45.7      |       44.2       |          55.5          |      89.7      |       55.4        |     -      |
+| **Code Agent**                 |               |                  |                        |                |                   |            |
+| SWE-Bench Verified             |     73.4      |       71.3       |          73.1          |      76.2      |       77.2        |    74.9    |
+| SWE-Bench Multilingual         |     71.7      |       61.1       |          70.2          |       -        |       68.0        |    55.3    |
+| Terminal-Bench Hard            |     30.5      |       30.6       |          35.4          |      39.0      |       33.3        |    30.5    |
+| Terminal-Bench 2.0             |     38.5      |       35.7       |          46.4          |      54.2      |       42.8        |    35.2    |
+| **General Agent**              |               |                  |                        |                |                   |            |
+| BrowseComp                     |     45.4      |        -         |          51.4          |       -        |       24.1        |    54.9    |
+| BrowseComp (w/ Context Manage) |     58.3      |       60.2       |          67.6          |      59.2      |         -         |     -      |
+| \\(\tau^2\\)-Bench                 |     80.3      |       74.3       |          80.3          |      85.4      |       84.7        |    80.2    |
+-----
+## 4. Model Architecture
+<p align="center">
+  <img width="80%" src="https://github.com/XiaomiMiMo/MiMo-V2-Flash/raw/main/figures/MiMo-v2-flash-arch.png?raw=true">
+</p>
+### Hybrid Sliding Window Attention
+MiMo-V2-Flash addresses the quadratic complexity of long contexts by interleaving Local Sliding Window Attention (SWA) and Global Attention (GA).
+  * **Configuration**: Stacks of \\(M=8\\) hybrid blocks. Each block contains \\(N=5\\) SWA layers followed by 1 GA layer.
+  * **Efficiency**: SWA layers use a window size of 128 tokens, reducing KV cache significantly.
+  * **Sink Bias**: Learnable attention sink bias is applied to maintain performance despite the aggressive window size.
+### Lightweight Multi-Token Prediction (MTP)
+Unlike traditional speculative decoding, our MTP module is natively integrated for training and inference.
+  * **Structure**: Uses a dense FFN (instead of MoE) and SWA (instead of GA) to keep the parameter count low (0.33B per block).
+  * **Performance**: Facilitates self-speculative decoding, tripling generation speed and mitigating GPU idleness during small-batch RL training.
+-----
+## 5. Post-Training Technical Highlights
+MiMo-V2-Flash leverages a post-training pipeline designed to maximize reasoning and agentic capabilities through innovative distillation and reinforcement learning strategies.
+### 5.1 Multi-Teacher On-Policy Distillation (MOPD)
+We introduce **Multi-Teacher On-Policy Distillation (MOPD)**, a new paradigm that formulates knowledge distillation as a reinforcement learning process.
+* **Dense Token-Level Guidance**: Unlike methods relying on sparse sequence-level feedback, MOPD utilizes domain-specific expert models (teachers) to provide supervision at every token position.
+* **On-Policy Optimization**: The student model learns from its own generated responses rather than a fixed dataset. This eliminates exposure bias and ensures smaller, more stable gradient updates.
+* **Inherent Reward Robustness**: Rewards are derived from the distribution divergence between student and teacher, making the process naturally resistant to reward hacking.
+### 5.2 Scaling Agentic RL
+We significantly scale up the agentic training environments to improve intelligence and generalization.
+* **Massive Code Agent Environments**: We utilize real-world GitHub issues to create over 100,000 verifiable tasks. Our automated pipeline maintains a Kubernetes cluster capable of running over 10,000 concurrent pods with a 70% environment setup success rate.
+* **Multimodal Verifier for WebDev**: For web development tasks, we employ a vision-based verifier that evaluates code execution via recorded videos rather than static screenshots. This reduces visual hallucination and ensures functional correctness.
+* **Cross-Domain Generalization**: Our experiments show that large-scale RL training on code agents effectively generalizes to other domains, boosting performance in Math and General Agent tasks.
+### 5.3 Advanced RL Infrastructure
+To support high-throughput RL training for large-scale MoE models, we implemented several infrastructure optimizations on top of SGLang and Megatron-LM.
+* **Rollout Routing Replay (R3)**: Addresses numerical precision inconsistencies in MoE routing between inference and training. R3 reuses the exact routed experts from rollout during the training pass, ensuring consistency with negligible overhead.
+* **Request-Level Prefix Cache**: In multi-turn agent training, this cache stores KV states and routed experts from prior turns. It avoids re-computation and ensures sampling consistency across turns.
+* **Fine-Grained Data Scheduler**: We extend the rollout engine to schedule fine-grained sequences instead of micro-batches. Combined with partial rollout, this significantly reduces GPU idleness caused by long-tail stragglers.
+* **Toolbox & Tool Manager**: A two-layer design using Ray actor pools to handle resource contention. It eliminates cold-start delays for tool execution and isolates task logic from system policies.
+-----
+## 6. Inference & Deployment
+MiMo-V2-Flash supports FP8 mixed precision inference. We recommend using **SGLang** for optimal performance.
+### Quick Start with SGLang
+```bash
+pip install sglang
+# Launch server
+python3 -m sglang.launch_server \
+        --model-path XiaomiMiMo/MiMo-V2-Flash \
+        --served-model-name mimo-v2-flash \
+        --pp-size 1 \
+        --dp-size 2 \
+        --enable-dp-attention \
+        --tp-size 8 \
+        --moe-a2a-backend deepep \
+        --page-size 1 \
+        --host 0.0.0.0 \
+        --port 9001 \
+        --trust-remote-code \
+        --mem-fraction-static 0.75 \
+        --max-running-requests 128 \
+        --chunked-prefill-size 16384 \
+        --reasoning-parser qwen3 \
+        --tool-call-parser mimo \
+        --context-length 262144 \
+        --attention-backend fa3 \
+        --speculative-algorithm EAGLE \
+        --speculative-num-steps 3 \
+        --speculative-eagle-topk 1 \
+        --speculative-num-draft-tokens 4 \
+        --enable-mtp
+# Send request
+curl -i http://localhost:9001/v1/chat/completions \
+    -H 'Content-Type:application/json' \
+    -d  '{
+            "messages" : [{
+                "role": "user",
+                "content": "Nice to meet you MiMo"
+            }],
+            "model": "mimo-v2-flash",
+            "max_tokens": 4096,
+            "temperature": 0.8,
+            "top_p": 0.95,
+            "stream": true,
+            "chat_template_kwargs": {
+                "enable_thinking": true
+            }
+        }'
+```
+### Notifications
+#### 1. System prompt
+> [!IMPORTANT]
+> The following system prompts are **HIGHLY** recommended, please choose from English and Chinese version.
+English
+```plaintext
+You are MiMo, an AI assistant developed by Xiaomi.
+Today's date: {date} {week}. Your knowledge cutoff date is December 2024.
+```
+Chinese
+```plaintext
+你是MiMo（中文名称也是MiMo），是小米公司研发的AI智能助手。
+今天的日期：{date} {week}，你���知识截止日期是2024年12月。
+```
+#### 2. Sampling parameters
+> [!IMPORTANT]
+> Recommended sampling parameters:
+>
+> `top_p=0.95`
+>
+> `temperature=0.8` for math, writing, web-dev
+>
+> `temperature=0.3` for agentic taks (e.g., vibe-coding, tool-use)
+#### 3. Tool-use practice
+> [!IMPORTANT]
+> In the thinking mode with multi-turn tool calls, the model returns a `reasoning_content` field alongside `tool_calls`. To continue the conversation, the user must persist all history `reasoning_content` in the `messages` array of each subsequent request.
+-----
+## 7. Citation
+If you find our work helpful, please cite our technical report:
+```bibtex
+@misc{mimo2025flash,
+  title={MiMo-V2-Flash Technical Report},
+  author={LLM-Core Xiaomi},
+  year={2025},
+  url={https://github.com/XiaomiMiMo/MiMo-V2-Flash/paper.pdf}
+}
+```
+## 8. Contact
+Please contact us at [mimo@xiaomi.com](mailto:mimo@xiaomi.com), join our WeChat group below or open an issue if you have any questions.
+<p align="center">
+  <img src="https://github.com/XiaomiMiMo/MiMo-V2-Flash/raw/main/figures/wechat_group/wechat1.jpg?raw=true" width="20%" />
+  <img src="https://github.com/XiaomiMiMo/MiMo-V2-Flash/raw/main/figures/wechat_group/wechat2.jpg?raw=true" width="20%" />
+  <img src="https://github.com/XiaomiMiMo/MiMo-V2-Flash/raw/main/figures/wechat_group/wechat3.jpg?raw=true" width="20%" />
+  <img src="https://github.com/XiaomiMiMo/MiMo-V2-Flash/raw/main/figures/wechat_group/wechat4.jpg?raw=true" width="20%" />
+</p>

added_tokens.json ADDED Viewed

	@@ -0,0 +1,28 @@

+{
+  "</think>": 151668,
+  "</tool_call>": 151658,
+  "</tool_response>": 151666,
+  "<think>": 151667,
+  "<tool_call>": 151657,
+  "<tool_response>": 151665,
+  "<|box_end|>": 151649,
+  "<|box_start|>": 151648,
+  "<|endoftext|>": 151643,
+  "<|file_sep|>": 151664,
+  "<|fim_middle|>": 151660,
+  "<|fim_pad|>": 151662,
+  "<|fim_prefix|>": 151659,
+  "<|fim_suffix|>": 151661,
+  "<|im_end|>": 151645,
+  "<|im_start|>": 151644,
+  "<|image_pad|>": 151655,
+  "<|object_ref_end|>": 151647,
+  "<|object_ref_start|>": 151646,
+  "<|quad_end|>": 151651,
+  "<|quad_start|>": 151650,
+  "<|repo_name|>": 151663,
+  "<|video_pad|>": 151656,
+  "<|vision_end|>": 151653,
+  "<|vision_pad|>": 151654,
+  "<|vision_start|>": 151652
+}

chat_template.jinja ADDED Viewed

	@@ -0,0 +1,145 @@

+{# Unsloth template fixes #}
+{%- if not add_generation_prompt is defined -%}
+    {%- set add_generation_prompt = false -%}
+{%- endif -%}
+{%- if not enable_thinking is defined -%}
+    {%- set enable_thinking = false -%}
+{%- endif -%}
+{%- if not keep_all_reasoning is defined -%}
+    {%- set keep_all_reasoning = false -%}
+{%- endif -%}
+{%- macro render_extra_keys(json_dict, handled_keys) -%}
+    {%- if json_dict is mapping %}
+        {%- for json_key in json_dict if json_key not in handled_keys %}
+            {%- if json_dict[json_key] is mapping or (json_dict[json_key] is sequence and json_dict[json_key] is not string) %}
+                {{- '\n<' ~ json_key ~ '>' ~ (json_dict[json_key] | tojson | safe) ~ '</' ~ json_key ~ '>' }}
+            {%- else %}
+                {{-'\n<' ~ json_key ~ '>' ~ (json_dict[json_key] | string) ~ '</' ~ json_key ~ '>' }}
+            {%- endif %}
+        {%- endfor %}
+    {%- endif %}
+{%- endmacro -%}
+{%- if messages[0]["role"] == "system" %}
+    {%- set system_message = messages[0]["content"] %}
+    {%- set loop_messages = messages[1:] %}
+{%- else %}
+    {%- set loop_messages = messages %}
+{%- endif %}
+{%- set ns = namespace(last_user_index=-1) %}
+{%- for m in loop_messages %}
+    {%- if m.role == 'user' %}
+        {%- set ns.last_user_index = loop.index0 -%}
+    {%- endif %}
+{%- endfor %}
+{%- if not tools is defined %}
+    {%- set tools = [] %}
+{%- endif %}
+{%- if system_message is defined %}
+    {{- "<|im_start|>system\n" + system_message }}
+{%- else %}
+    {{- "<|im_start|>system\nYou are MiMo, a helpful AI assistant engineered by Xiaomi." }}
+{%- endif %}
+{%- if tools is iterable and tools | length > 0 %}
+    {{- "\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou have access to the following functions:\n\n" }}
+    {{- "<tools>" }}
+    {%- for tool in tools %}
+        {%- if tool.function is defined %}
+            {%- set tool = tool.function %}
+        {%- endif %}
+        {{- "\n<function>\n<name>" ~ tool.name ~ "</name>" }}
+        {%- if tool.description is defined %}
+            {{- '\n<description>' ~ (tool.description | trim) ~ '</description>' }}
+        {%- endif %}
+        {{- '\n<parameters>' }}
+        {%- if tool.parameters is defined and tool.parameters is mapping and tool.parameters.properties is defined and tool.parameters.properties is mapping %}
+            {%- for param_name, param_fields in tool.parameters.properties|items %}
+                {{- '\n<parameter>' }}
+                {{- '\n<name>' ~ param_name ~ '</name>' }}
+                {%- if param_fields.type is defined %}
+                    {{- '\n<type>' ~ (param_fields.type | string) ~ '</type>' }}
+                {%- endif %}
+                {%- if param_fields.description is defined %}
+                    {{- '\n<description>' ~ (param_fields.description | trim) ~ '</description>' }}
+                {%- endif %}
+                {%- set handled_keys = ['name', 'type', 'description'] %}
+                {{- render_extra_keys(param_fields, handled_keys) }}
+                {{- '\n</parameter>' }}
+            {%- endfor %}
+        {%- endif %}
+        {%- set handled_keys = ['type', 'properties'] %}
+        {{- render_extra_keys(tool.parameters, handled_keys) }}
+        {{- '\n</parameters>' }}
+        {%- set handled_keys = ['type', 'name', 'description', 'parameters'] %}
+        {{- render_extra_keys(tool, handled_keys) }}
+        {{- '\n</function>' }}
+    {%- endfor %}
+    {{- "\n</tools>" }}
+    {{- '\n\nFor each function call, output the function name and arguments in the following format:\n<tool_call>\n<function=example_function_name>\n<parameter=example_parameter_1>value_1</parameter>\n<parameter=example_parameter_2>This is the value for the second parameter\nthat can span\nmultiple lines</parameter>\n</function>\n</tool_call>\n\n<IMPORTANT>\n- Function calls MUST follow the specified format: an inner <function=...></function> block must be nested within <tool_call></tool_call> XML tags\n- DO NOT use function calls inside <think></think> tags.\n- The value enclosed between parameter tags is preserved exactly as-is, including newlines and spaces.\n</IMPORTANT>' }}
+{%- endif %}
+{{- '<|im_end|>' }}
+{%- for message in loop_messages %}
+    {%- if message.content is string %}
+        {%- set content = message.content %}
+    {%- else %}
+        {%- set content = '' %}
+    {%- endif %}
+    {%- if message.role == "assistant" %}
+        {%- if message.reasoning_content is string %}
+            {%- set reasoning_content = message.reasoning_content %}
+        {%- else %}
+            {%- set reasoning_content = '' %}
+            {%- if '</think>' in content %}
+                {%- set reasoning_content = content.split('</think>')[0].split('<think>')[-1] %}
+                {%- set content = content.split('</think>')[-1] %}
+            {%- endif %}
+        {%- endif %}
+        {%- if (keep_all_reasoning or loop.index0 > ns.last_user_index) and reasoning_content -%}
+            {{- '<|im_start|>' + message.role + '\n<think>' + reasoning_content + '</think>' + content }}
+        {%- else %}
+            {{- '<|im_start|>' + message.role + '\n<think></think>' + content }}
+        {%- endif %}
+        {%- if message.tool_calls is defined and message.tool_calls is iterable and message.tool_calls | length > 0 %}
+            {%- for tool_call in message.tool_calls %}
+                {%- if tool_call.function is defined %}
+                    {%- set tool_call = tool_call.function %}
+                {%- endif %}
+                {{- '<tool_call>\n<function=' + tool_call.name + '>\n' }}
+                {%- if tool_call.arguments is defined %}{%- if tool_call.arguments is mapping %}
+                    {%- for args_name, args_value in tool_call.arguments|items %}
+                        {{- '<parameter=' + args_name + '>' }}
+                        {%- set args_value = args_value | tojson | safe if args_value is mapping or (args_value is sequence and args_value is not string) else args_value | string %}
+                        {{- args_value }}
+                        {{- '</parameter>\n' }}
+                    {%- endfor %}{%- endif %}
+                {%- endif %}
+                {{- '</function>\n</tool_call>' }}
+            {%- endfor %}
+        {%- endif %}
+        {{- '<|im_end|>' }}
+    {%- elif message.role == "user" or message.role == "system"%}
+        {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' }}
+    {%- elif message.role == "tool" %}
+        {%- if loop.previtem and loop.previtem.role != "tool" %}
+            {{- '<|im_start|>tool\n' }}
+        {%- endif %}
+        {{- '<tool_response>\n' }}
+        {{- message.content }}
+        {{- '\n</tool_response>\n' }}
+        {%- if not loop.last and loop.nextitem.role != "tool" %}
+            {{- '<|im_end|>' }}
+        {%- elif loop.last %}
+            {{- '<|im_end|>' }}
+        {%- endif %}
+    {%- else %}
+        {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' }}
+    {%- endif %}
+{%- endfor %}
+{%- if add_generation_prompt %}
+    {{- '<|im_start|>assistant\n' }}
+    {%- if not enable_thinking -%}
+        {{- '<think></think>' -}}
+    {%- else -%}
+        {{- '' -}}
+    {%- endif -%}
+{%- endif %}
+{# Copyright 2025-present Unsloth. Apache 2.0 License. #}

config.json ADDED Viewed

	@@ -0,0 +1,216 @@

+{
+  "add_full_attention_sink_bias": false,
+  "add_swa_attention_sink_bias": true,
+  "architectures": [
+    "MiMoV2FlashForCausalLM"
+  ],
+  "attention_bias": false,
+  "attention_chunk_size": 128,
+  "attention_dropout": 0.0,
+  "attention_value_scale": 0.707,
+  "auto_map": {
+    "AutoConfig": "configuration_mimo_v2_flash.MiMoV2FlashConfig",
+    "AutoModel": "modeling_mimo_v2_flash.MiMoV2FlashModel",
+    "AutoModelForCausalLM": "modeling_mimo_v2_flash.MiMoV2FlashForCausalLM"
+  },
+  "torch_dtype": "bfloat16",
+  "head_dim": 192,
+  "hidden_act": "silu",
+  "hidden_size": 4096,
+  "hybrid_block_size": null,
+  "hybrid_layer_pattern": [
+    0,
+    1,
+    1,
+    1,
+    1,
+    0,
+    1,
+    1,
+    1,
+    1,
+    1,
+    0,
+    1,
+    1,
+    1,
+    1,
+    1,
+    0,
+    1,
+    1,
+    1,
+    1,
+    1,
+    0,
+    1,
+    1,
+    1,
+    1,
+    1,
+    0,
+    1,
+    1,
+    1,
+    1,
+    1,
+    0,
+    1,
+    1,
+    1,
+    1,
+    1,
+    0,
+    1,
+    1,
+    1,
+    1,
+    1,
+    0
+  ],
+  "initializer_range": 0.02,
+  "intermediate_size": 16384,
+  "layernorm_epsilon": 1e-05,
+  "max_position_embeddings": 262144,
+  "moe_intermediate_size": 2048,
+  "moe_layer_freq": [
+    0,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1,
+    1
+  ],
+  "n_group": 1,
+  "n_routed_experts": 256,
+  "n_shared_experts": null,
+  "norm_topk_prob": true,
+  "num_attention_heads": 64,
+  "num_experts_per_tok": 8,
+  "num_hidden_layers": 48,
+  "num_key_value_heads": 4,
+  "pad_token_id": 151654,
+  "partial_rotary_factor": 0.334,
+  "quantization_config": {
+    "activation_scheme": "dynamic",
+    "fmt": "e4m3",
+    "ignored_layers": [
+      "model.layers.0.self_attn.o_proj",
+      "model.layers.1.self_attn.o_proj",
+      "model.layers.2.self_attn.o_proj",
+      "model.layers.3.self_attn.o_proj",
+      "model.layers.4.self_attn.o_proj",
+      "model.layers.5.self_attn.o_proj",
+      "model.layers.6.self_attn.o_proj",
+      "model.layers.7.self_attn.o_proj",
+      "model.layers.8.self_attn.o_proj",
+      "model.layers.9.self_attn.o_proj",
+      "model.layers.10.self_attn.o_proj",
+      "model.layers.11.self_attn.o_proj",
+      "model.layers.12.self_attn.o_proj",
+      "model.layers.13.self_attn.o_proj",
+      "model.layers.14.self_attn.o_proj",
+      "model.layers.15.self_attn.o_proj",
+      "model.layers.16.self_attn.o_proj",
+      "model.layers.17.self_attn.o_proj",
+      "model.layers.18.self_attn.o_proj",
+      "model.layers.19.self_attn.o_proj",
+      "model.layers.20.self_attn.o_proj",
+      "model.layers.21.self_attn.o_proj",
+      "model.layers.22.self_attn.o_proj",
+      "model.layers.23.self_attn.o_proj",
+      "model.layers.24.self_attn.o_proj",
+      "model.layers.25.self_attn.o_proj",
+      "model.layers.26.self_attn.o_proj",
+      "model.layers.27.self_attn.o_proj",
+      "model.layers.28.self_attn.o_proj",
+      "model.layers.29.self_attn.o_proj",
+      "model.layers.30.self_attn.o_proj",
+      "model.layers.31.self_attn.o_proj",
+      "model.layers.32.self_attn.o_proj",
+      "model.layers.33.self_attn.o_proj",
+      "model.layers.34.self_attn.o_proj",
+      "model.layers.35.self_attn.o_proj",
+      "model.layers.36.self_attn.o_proj",
+      "model.layers.37.self_attn.o_proj",
+      "model.layers.38.self_attn.o_proj",
+      "model.layers.39.self_attn.o_proj",
+      "model.layers.40.self_attn.o_proj",
+      "model.layers.41.self_attn.o_proj",
+      "model.layers.42.self_attn.o_proj",
+      "model.layers.43.self_attn.o_proj",
+      "model.layers.44.self_attn.o_proj",
+      "model.layers.45.self_attn.o_proj",
+      "model.layers.46.self_attn.o_proj",
+      "model.layers.47.self_attn.o_proj",
+      "model.decoder.self_attn.o_proj"
+    ],
+    "packed_modules_mapping": {},
+    "quant_method": "fp8",
+    "weight_block_size": [
+      128,
+      128
+    ]
+  },
+  "rope_scaling": null,
+  "rope_theta": 5000000,
+  "routed_scaling_factor": null,
+  "scoring_func": "sigmoid",
+  "sliding_window": 128,
+  "sliding_window_size": 128,
+  "swa_head_dim": 192,
+  "swa_num_attention_heads": 64,
+  "swa_num_key_value_heads": 8,
+  "swa_rope_theta": 10000,
+  "swa_v_head_dim": 128,
+  "tie_word_embeddings": false,
+  "topk_group": 1,
+  "topk_method": "noaux_tc",
+  "transformers_version": "4.57.3",
+  "unsloth_fixed": true,
+  "use_cache": true,
+  "v_head_dim": 128,
+  "vocab_size": 152576
+}

configuration_mimo_v2_flash.py ADDED Viewed

	@@ -0,0 +1,109 @@

+# coding=utf-8
+#
+# Copyright 2025 Xiaomi Corporation.
+# Copyright 2025 The HuggingFace Inc. team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from transformers.configuration_utils import PretrainedConfig
+from transformers.modeling_rope_utils import rope_config_validation
+from transformers.utils import logging
+logger = logging.get_logger(__name__)
+class MiMoV2FlashConfig(PretrainedConfig):
+    model_type = ""
+    keys_to_ignore_at_inference = ["past_key_values"]
+    # Default tensor parallel plan for base model `Hybrid`
+    base_model_tp_plan = {
+        "layers.*.self_attn.q_proj": "colwise",
+        "layers.*.self_attn.k_proj": "colwise",
+        "layers.*.self_attn.v_proj": "colwise",
+        "layers.*.self_attn.o_proj": "rowwise",
+        "layers.*.mlp.gate_proj": "colwise",
+        "layers.*.mlp.up_proj": "colwise",
+        "layers.*.mlp.down_proj": "rowwise",
+    }
+    base_model_pp_plan = {
+        "embed_tokens": (["input_ids"], ["inputs_embeds"]),
+        "layers": (["hidden_states", "attention_mask"], ["hidden_states"]),
+        "norm": (["hidden_states"], ["hidden_states"]),
+    }
+    attribute_map = {
+        "num_local_experts": "n_routed_experts",
+    }
+    def __init__(
+        self,
+        vocab_size=151936,
+        hidden_size=4096,
+        intermediate_size=22016,
+        num_hidden_layers=32,
+        num_attention_heads=32,
+        num_key_value_heads=32,
+        hidden_act="silu",
+        max_position_embeddings=32768,
+        initializer_range=0.02,
+        layernorm_epsilon=1e-6,
+        use_cache=True,
+        tie_word_embeddings=False,
+        rope_theta=10000.0,
+        rope_scaling=None,
+        attention_dropout=0.0,
+        hybrid_block_size=None,
+        hybrid_layer_pattern=None,
+        partial_rotary_factor=1.0,
+        **kwargs,
+    ):
+        self.vocab_size = vocab_size
+        self.max_position_embeddings = max_position_embeddings
+        self.hidden_size = hidden_size
+        self.intermediate_size = intermediate_size
+        self.num_hidden_layers = num_hidden_layers
+        self.num_attention_heads = num_attention_heads
+        # for backward compatibility
+        if num_key_value_heads is None:
+            num_key_value_heads = num_attention_heads
+        self.num_key_value_heads = num_key_value_heads
+        self.hidden_act = hidden_act
+        self.initializer_range = initializer_range
+        self.layernorm_epsilon = layernorm_epsilon
+        self.use_cache = use_cache
+        self.rope_theta = rope_theta
+        self.rope_scaling = rope_scaling
+        self.attention_dropout = attention_dropout
+        if hybrid_block_size is not None and hybrid_layer_pattern is None:
+            hybrid_layer_pattern = [0 if ((i + 1) % hybrid_block_size == 0) else 1 for i in range(num_hidden_layers)]
+        self.hybrid_block_size = hybrid_block_size
+        self.hybrid_layer_pattern = hybrid_layer_pattern
+        self.partial_rotary_factor = partial_rotary_factor
+        # Validate the correctness of rotary position embeddings parameters
+        # BC: if there is a 'type' field, move it to 'rope_type'.
+        if self.rope_scaling is not None and "type" in self.rope_scaling:
+            self.rope_scaling["rope_type"] = self.rope_scaling["type"]
+        rope_config_validation(self)
+        super().__init__(
+            tie_word_embeddings=tie_word_embeddings,
+            **kwargs,
+        )

merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

model.safetensors.index.json ADDED Viewed

The diff for this file is too large to render. See raw diff

model_0.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a38e69fd84e5dbeb007a1e999bc186cf2ee5ab4d380a2255662e9dfe62ac3c2c
+size 324091032

model_1.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ac9b00d805466265c6cc7208958532d2819c3d6b73e8a551cbe71f1196ef6675
+size 132154312

model_10.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:15a76d7cd96b8f855072b0b9b2eb2ef323f45605eab9942d1962f3b189d9ae38
+size 132154328

model_10_linear_fc1.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b79a6754a6ceffd3b1164847e6371fe94f74cd44b9f8a177976dff8fb25f6ada
+size 4296144144

model_10_linear_fc2.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a103d7e8de4de8af339e4c0bf2f5595e86901e78586acf66278067033bcf8005
+size 2148072376

model_11.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d34f5fc039a11df7686fed6a46f6f43e5241a49dd8e4df1b959ae3b512d889c5
+size 126910184

model_11_linear_fc1.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c73f4786cc52b659e8e8c0693b8f1c70d14f6982618136b9b797c3f0a52bb5a5
+size 4296144144

model_11_linear_fc2.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4dab8f4037d40cd5389e1e10d7f7cd68e3c46e12a84bcb82ea8c5281457d226d
+size 2148072376

model_12.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:bb75be363c969bc2487049d654b47418f453f8080a89eb510be9d5bd57c9620b
+size 132154328

model_12_linear_fc1.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:37471f00a836b669b73cdb4ae7445cfb4f9ceb9b29af8869eb8f872e2a1aa780
+size 4296144144

model_12_linear_fc2.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c5e4e58c71850df727cf05d9d6ca8897f2f2dd379bfe76d401eea957b95437df
+size 2148072376

model_13.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:744ba55216e1d1f8651770266130e3a1d4d62f06fbc076c6f739c8279c7274ed
+size 132154328

model_13_linear_fc1.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4822f5647729e1d6056f7947c8369f798a437dd214272e89913666646c6a96b2
+size 4296144144

model_13_linear_fc2.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9b2c3678c75f8d2a677db292a0c7dd8f2480c8d0ebe52ef45370f87dd047dc19
+size 2148072376

model_14.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:44c395ec24de044119c8ebe9e361e5e40323feb8e57607386c0369522a343432
+size 132154328

model_14_linear_fc1.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a52d904947cd7c257e3d83ddb70d6afdbb50eb7cf777331f27fd5b34ffe0b1eb
+size 4296144144

model_14_linear_fc2.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f5c2b2dcecae4df308f2f41e2b79813414de2736012e147897baad21a5a57960
+size 2148072376

model_15.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9dcccb45364d69afab25fb0837befdb80a13257a7491bdd3ba6b83b3c5a1555d
+size 132154328

model_15_linear_fc1.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e44d6af2fe49e8c00c207b6e5ef5cd2078413b0dde93571f7c67d65ed8129393
+size 4296144144

model_15_linear_fc2.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0ec6cc6353c6d9b0553b2363845a9c3153a8157edbe1d4c51b8f10c69fc26b7e
+size 2148072376

model_16.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:61e250132d8f4f753d1ee7d5cbffce109bac0e419685757781417d500d0bcc87
+size 132154328

model_16_linear_fc1.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5103440d890826c0bd1ca685e5447633a4a6e95668766b35b5898aa59f30b5a0
+size 4296144144

model_16_linear_fc2.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:554bf20e506b7ccd94e277375e2632868a19d8d7186d4cb565a5118751e3d409
+size 2148072376

model_17.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3cbfd9461946facfe56dfdf14222aa8a4a3d8b19ef4c32b1c814f1b42cbee113
+size 126910184

model_17_linear_fc1.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:db89055cc8a6649d8d36da0541558c272c87cc8fdbe88dc01767d85b4c99d41a
+size 4296144144

model_17_linear_fc2.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2340276698490a892d23223513e58dc476018d6e8681f07665860ec8f1c78e98
+size 2148072376

model_18.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a277cf00ce1bc93147ca00f4f5fe09a72ac9ed27973e9a87960494d6ef90908a
+size 132154328

model_18_linear_fc1.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:66dc68b04fb2c373ec9e01ba385bf454eee731e050b4f9990b78ec3292a8b366
+size 4296144144

model_18_linear_fc2.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:cbaeb46c35c2a22594c14b11264a2a91c93d1be3f8247336321a558309150d03
+size 2148072376

model_19.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9b6dc3861aa3176eda4cae4b82cb4a347bfb2bbddad7245202f6eff5ee89e7c9
+size 132154328

model_19_linear_fc1.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:edccbc22110f574a7c7510f11d092877b73292dd1394e91aab2d7c77bf8eec81
+size 4296144144

model_19_linear_fc2.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f94702133c7733209ee97d077053e368ba6e218bab7ae243538ecff6b37ee2a5
+size 2148072376

model_1_linear_fc1.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:de58f252388fc33c62a3cc709d98996d0d56c9046068f22aa6e9d7861294e579
+size 4296143120

model_1_linear_fc2.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0614837e791547d06388dc9395913f93fb6d188dbc11800b6bbf62ca0fb4ba09
+size 2148071864

model_2.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:63ea731b8fe60181264e89e23b6f7ae43616353b2ceb843a9194806b424c7fcf
+size 132154312

model_20.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:dd223a5ce5bcc4cc542314eb435cc6dc4b366a7c5411e471ffaad21f6ac7b5b7
+size 132154328

model_20_linear_fc1.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e98dab38464b91238ebbc7dcb720a879f466b1d9eaadb817b31b123df0e8ef46
+size 4296144144

model_20_linear_fc2.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:02f8a73128df2222145285bfcb4c77544ddabcb94dbe403d8dcfa5c817329b5f
+size 2148072376

model_21.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6920b442fa917d285ef417cb5ae4d09d8b716412e04c31e128bd717957b62bda
+size 132154328

model_21_linear_fc1.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fb5c8084d033a24f337305341e1de627e3dfd164225d04c7b3de6fb668e2bc6a
+size 4296144144

model_21_linear_fc2.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d3cb86f8ebd086fdc2c055a68f70c7b9855e0eb9abce5f6fd8b5df87ccc2dc3a
+size 2148072376

model_22.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:97cc9a1d9da89630968e326e20c28da8fa8a662830b26479df5b893328fc89d2
+size 132154328