--- library_name: transformers --- # tiny-mimo-v2-flash A ~2.34B-parameter tiny random-weight checkpoint of [XiaomiMiMo/MiMo-V2-Flash](https://huggingface.co/XiaomiMiMo/MiMo-V2-Flash), used for internal testing in Hugging Face `transformers` for the native HF implementation. ## Configuration | Hyperparameter | Value | Original MiMo | |--------|-------|--------------| | `num_hidden_layers` | 5 | 48 | | `layer_types` | `[full, slidingĂ—4]` | matches pattern | | `mlp_layer_types` | `[dense, sparseĂ—4]` | matches pattern (layer 0 dense, rest MoE) | | `hidden_size` | 2048 | 4096 (**ratio 2.0**) | | `intermediate_size` | 8192 | 16384 (**ratio 2.0**) | | `moe_intermediate_size` | 1024 | 2048 (**ratio 2.0**) | | `num_attention_heads` / `num_key_value_heads` | 16 / 1 | 64 / 4 (**ratio 4.0**) | | `head_dim` / `v_head_dim` | 192 / 128 | 192 / 128 | | `n_routed_experts` / `num_experts_per_tok` | 64 / 2 | 256 / 8 (**ratio 4.0**) | | parameters | 2.34B | 300B |