MiMo-V2.5 — AWQ W4A16 (int4)

An AWQ W4A16 (4-bit) quantization of XiaomiMiMo/MiMo-V2.5 for vLLM, produced with llm-compressor.

  • Scheme: group-wise int4 weights (AWQ), fp16 activations; compressed-tensors pack-quantized.
  • Scope: routed MoE experts only — attention, the MoE router, lm_head, MTP, and the multimodal encoders are kept at original precision.
  • Size: ~580 GB (bf16) -> ~159 GB (int4).

Usage

Serve with vLLM (--trust-remote-code). See the base model card for prompt format and capabilities.

Downloads last month
-
Safetensors
Model size
47B params
Tensor type
I64
·
I32
·
BF16
·
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for spectator2026/MiMo-V2.5-AWQ-int4

Quantized
(24)
this model