---
license: other
base_model: MiniMaxAI/MiniMax-M2.5
tags:
  - gguf
  - llama.cpp
  - quantized
  - moe
---

# MiniMax-M2.5 GGUF

GGUF quantizations of [MiniMaxAI/MiniMax-M2.5](https://huggingface.co/MiniMaxAI/MiniMax-M2.5), created with [llama.cpp](https://github.com/ggerganov/llama.cpp).

## Model Details

| Property | Value |
|----------|-------|
| **Base model** | MiniMaxAI/MiniMax-M2.5 |
| **Architecture** | Mixture of Experts (MoE) |
| **Total parameters** | 230B |
| **Active parameters** | 10B per token |
| **Layers** | 62 |
| **Total experts** | 256 |
| **Active experts per token** | 8 |
| **Source precision** | FP8 (`float8_e4m3fn`) |

## Available Quantizations

| Quantization | Size | Description |
|-------------|------|-------------|
| Q8_0 | 227 GB | 8-bit quantization, highest quality |
| Q4_K_M | 129 GB | 4-bit K-quant (medium), good balance of quality and size |
| IQ3_S | 92 GB | 3-bit importance quantization (small), compact |
| Q2_K | 78 GB | 2-bit K-quant, smallest size |

## Usage

These GGUFs can be used with [llama.cpp](https://github.com/ggerganov/llama.cpp) and compatible frontends.

```bash
# Example with llama-cli
llama-cli -m MiniMax-M2.5-Q4_K_M.gguf -p "Hello" -n 128
```

## Notes

- The source model uses FP8 (`float8_e4m3fn`) precision, so Q8_0 is effectively lossless relative to the source weights.
- This is a large MoE model. Even the smallest quant (Q2_K) requires ~78GB due to the number of experts.
- Quantized from the official [MiniMaxAI/MiniMax-M2.5](https://huggingface.co/MiniMaxAI/MiniMax-M2.5) weights.