tiny-mimo-v2-flash / README.md
casinca's picture
Update README.md
6a676cd verified
---
library_name: transformers
---
# tiny-mimo-v2-flash
A ~2.34B-parameter tiny random-weight checkpoint of [XiaomiMiMo/MiMo-V2-Flash](https://huggingface.co/XiaomiMiMo/MiMo-V2-Flash), used for internal testing in Hugging Face `transformers` for the native HF implementation.
## Configuration
| Hyperparameter | Value | Original MiMo |
|--------|-------|--------------|
| `num_hidden_layers` | 5 | 48 |
| `layer_types` | `[full, sliding×4]` | matches pattern |
| `mlp_layer_types` | `[dense, sparse×4]` | matches pattern (layer 0 dense, rest MoE) |
| `hidden_size` | 2048 | 4096 (**ratio 2.0**) |
| `intermediate_size` | 8192 | 16384 (**ratio 2.0**) |
| `moe_intermediate_size` | 1024 | 2048 (**ratio 2.0**) |
| `num_attention_heads` / `num_key_value_heads` | 16 / 1 | 64 / 4 (**ratio 4.0**) |
| `head_dim` / `v_head_dim` | 192 / 128 | 192 / 128 |
| `n_routed_experts` / `num_experts_per_tok` | 64 / 2 | 256 / 8 (**ratio 4.0**) |
| parameters | 2.34B | 300B |