How to use from
llama.cppInstall from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Goldkoron/MiniMax-M2.7# Run inference directly in the terminal:
llama-cli -hf Goldkoron/MiniMax-M2.7Use pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf Goldkoron/MiniMax-M2.7# Run inference directly in the terminal:
./llama-cli -hf Goldkoron/MiniMax-M2.7Build from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf Goldkoron/MiniMax-M2.7# Run inference directly in the terminal:
./build/bin/llama-cli -hf Goldkoron/MiniMax-M2.7Use Docker
docker model run hf.co/Goldkoron/MiniMax-M2.7Quick Links
MiniMax-M2.7 β Gutenberg Quants
Quantizations of MiniMax-M2.7 using the Gutenberg (K_G) quantization strategy.
Available Quants
| Quant | Size | BPW | Mean KLD | Same Top P |
|---|---|---|---|---|
| K_G_5.00 | 133.1 GiB | 5.00 | 0.022412 | 92.447% |
| K_G_4.50 | 119.7 GiB | 4.50 | 0.029416 | 91.311% |
| K_G_4.00 | 106.4 GiB | 4.00 | 0.044050 | 89.497% |
| K_G_3.50 | 93.1 GiB | 3.50 | 0.061226 | 87.641% |
| K_G_3.00 | 79.9 GiB | 3.00 | 0.098738 | 84.454% |
| K_G_2.50 | 66.6 GiB | 2.50 | 0.172875 | 80.034% |
KLD and Same Top P measured against Q6_K expert reference logits (8192 context, 10 chunks).
vs Standard Quants (unsloth)
| Gutenberg | BPW | KLD | Standard (unsloth) | BPW | KLD |
|---|---|---|---|---|---|
| K_G_2.50 | 2.50 | 0.172875 | UD-IQ2_M | 2.45 | 0.191059 |
| K_G_3.00 | 3.00 | 0.098738 | UD-IQ3_XXS | 2.80 | 0.119762 |
| K_G_3.50 | 3.50 | 0.061226 | UD-Q3_K_M | 3.54 | 0.063647 |
| K_G_4.00 | 4.00 | 0.044050 | UD-IQ4_XS | 3.79 | 0.051081 |
| K_G_5.00 | 5.00 | 0.022412 | UD-Q4_K_M | 4.90 | 0.024529 |
Why Gutenberg?
Standard quantization applies uniform rules to all tensors. Gutenberg uses KLD sensitivity data to allocate precision where it matters most, upgrading the tensors that have the highest measured impact on output quality while keeping less important tensors at the base level.
The result is significantly better quality than standard quants at the same model size.
Compatibility
Fully compatible with stock llama.cpp, llama-server, LM Studio, and any GGUF-compatible runtime. No custom builds required.
- Downloads last month
- 6,067
Hardware compatibility
Log In to add your hardware
We're not able to determine the quantization variants.
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support
Model tree for Goldkoron/MiniMax-M2.7
Base model
MiniMaxAI/MiniMax-M2.7
Install from brew
# Start a local OpenAI-compatible server with a web UI: llama-server -hf Goldkoron/MiniMax-M2.7# Run inference directly in the terminal: llama-cli -hf Goldkoron/MiniMax-M2.7