MiMo-V2.5 — AWQ W4A16 (int4)
An AWQ W4A16 (4-bit) quantization of XiaomiMiMo/MiMo-V2.5 for vLLM, produced with llm-compressor.
- Scheme: group-wise int4 weights (AWQ), fp16 activations;
compressed-tensorspack-quantized. - Scope: routed MoE experts only — attention, the MoE router,
lm_head, MTP, and the multimodal encoders are kept at original precision. - Size: ~580 GB (bf16) -> ~159 GB (int4).
Usage
Serve with vLLM (--trust-remote-code). See the base model card for prompt format and capabilities.
- Downloads last month
- -
Model tree for spectator2026/MiMo-V2.5-AWQ-int4
Base model
XiaomiMiMo/MiMo-V2.5