| --- |
| library_name: transformers |
| --- |
| |
| # tiny-mimo-v2-flash |
|
|
| A ~2.34B-parameter tiny random-weight checkpoint of [XiaomiMiMo/MiMo-V2-Flash](https://huggingface.co/XiaomiMiMo/MiMo-V2-Flash), used for internal testing in Hugging Face `transformers` for the native HF implementation. |
|
|
| ## Configuration |
|
|
|
|
|
|
| | Hyperparameter | Value | Original MiMo | |
| |--------|-------|--------------| |
| | `num_hidden_layers` | 5 | 48 | |
| | `layer_types` | `[full, sliding×4]` | matches pattern | |
| | `mlp_layer_types` | `[dense, sparse×4]` | matches pattern (layer 0 dense, rest MoE) | |
| | `hidden_size` | 2048 | 4096 (**ratio 2.0**) | |
| | `intermediate_size` | 8192 | 16384 (**ratio 2.0**) | |
| | `moe_intermediate_size` | 1024 | 2048 (**ratio 2.0**) | |
| | `num_attention_heads` / `num_key_value_heads` | 16 / 1 | 64 / 4 (**ratio 4.0**) | |
| | `head_dim` / `v_head_dim` | 192 / 128 | 192 / 128 | |
| | `n_routed_experts` / `num_experts_per_tok` | 64 / 2 | 256 / 8 (**ratio 4.0**) | |
| | parameters | 2.34B | 300B | |
|
|
|
|