How to use from
llama.cpp
Install from brew
brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Goldkoron/MiniMax-M2.7
# Run inference directly in the terminal:
llama-cli -hf Goldkoron/MiniMax-M2.7
Install from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Goldkoron/MiniMax-M2.7
# Run inference directly in the terminal:
llama-cli -hf Goldkoron/MiniMax-M2.7
Use pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf Goldkoron/MiniMax-M2.7
# Run inference directly in the terminal:
./llama-cli -hf Goldkoron/MiniMax-M2.7
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf Goldkoron/MiniMax-M2.7
# Run inference directly in the terminal:
./build/bin/llama-cli -hf Goldkoron/MiniMax-M2.7
Use Docker
docker model run hf.co/Goldkoron/MiniMax-M2.7
Quick Links

MiniMax-M2.7 β€” Gutenberg Quants

Quantizations of MiniMax-M2.7 using the Gutenberg (K_G) quantization strategy.

Available Quants

Quant Size BPW Mean KLD Same Top P
K_G_5.00 133.1 GiB 5.00 0.022412 92.447%
K_G_4.50 119.7 GiB 4.50 0.029416 91.311%
K_G_4.00 106.4 GiB 4.00 0.044050 89.497%
K_G_3.50 93.1 GiB 3.50 0.061226 87.641%
K_G_3.00 79.9 GiB 3.00 0.098738 84.454%
K_G_2.50 66.6 GiB 2.50 0.172875 80.034%

KLD and Same Top P measured against Q6_K expert reference logits (8192 context, 10 chunks).

vs Standard Quants (unsloth)

Gutenberg BPW KLD Standard (unsloth) BPW KLD
K_G_2.50 2.50 0.172875 UD-IQ2_M 2.45 0.191059
K_G_3.00 3.00 0.098738 UD-IQ3_XXS 2.80 0.119762
K_G_3.50 3.50 0.061226 UD-Q3_K_M 3.54 0.063647
K_G_4.00 4.00 0.044050 UD-IQ4_XS 3.79 0.051081
K_G_5.00 5.00 0.022412 UD-Q4_K_M 4.90 0.024529

Why Gutenberg?

Standard quantization applies uniform rules to all tensors. Gutenberg uses KLD sensitivity data to allocate precision where it matters most, upgrading the tensors that have the highest measured impact on output quality while keeping less important tensors at the base level.

The result is significantly better quality than standard quants at the same model size.

Compatibility

Fully compatible with stock llama.cpp, llama-server, LM Studio, and any GGUF-compatible runtime. No custom builds required.

Downloads last month
6,067
GGUF
Model size
229B params
Architecture
minimax-m2
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Goldkoron/MiniMax-M2.7

Quantized
(103)
this model