Quantizations of MiniMax-M2.7 using the Gutenberg (K_G) quantization strategy.
Available Quants
Quant
Size
BPW
Mean KLD
Same Top P
K_G_5.00
133.1 GiB
5.00
0.022412
92.447%
K_G_4.50
119.7 GiB
4.50
0.029416
91.311%
K_G_4.00
106.4 GiB
4.00
0.044050
89.497%
K_G_3.50
93.1 GiB
3.50
0.061226
87.641%
K_G_3.00
79.9 GiB
3.00
0.098738
84.454%
K_G_2.50
66.6 GiB
2.50
0.172875
80.034%
KLD and Same Top P measured against Q6_K expert reference logits (8192 context, 10 chunks).
vs Standard Quants (unsloth)
Gutenberg
BPW
KLD
Standard (unsloth)
BPW
KLD
K_G_2.50
2.50
0.172875
UD-IQ2_M
2.45
0.191059
K_G_3.00
3.00
0.098738
UD-IQ3_XXS
2.80
0.119762
K_G_3.50
3.50
0.061226
UD-Q3_K_M
3.54
0.063647
K_G_4.00
4.00
0.044050
UD-IQ4_XS
3.79
0.051081
K_G_5.00
5.00
0.022412
UD-Q4_K_M
4.90
0.024529
Why Gutenberg?
Standard quantization applies uniform rules to all tensors. Gutenberg uses KLD sensitivity data to allocate precision where it matters most, upgrading the tensors that have the highest measured impact on output quality while keeping less important tensors at the base level.
The result is significantly better quality than standard quants at the same model size.
Compatibility
Fully compatible with stock llama.cpp, llama-server, LM Studio, and any GGUF-compatible runtime. No custom builds required.