MiniMax-M2.5-GGUF / README.md
marksverdhei's picture
Upload README.md with huggingface_hub
5ba5300 verified
---
license: other
base_model: MiniMaxAI/MiniMax-M2.5
tags:
- gguf
- llama.cpp
- quantized
- moe
---
# MiniMax-M2.5 GGUF
GGUF quantizations of [MiniMaxAI/MiniMax-M2.5](https://huggingface.co/MiniMaxAI/MiniMax-M2.5), created with [llama.cpp](https://github.com/ggerganov/llama.cpp).
## Model Details
| Property | Value |
|----------|-------|
| **Base model** | MiniMaxAI/MiniMax-M2.5 |
| **Architecture** | Mixture of Experts (MoE) |
| **Total parameters** | 230B |
| **Active parameters** | 10B per token |
| **Layers** | 62 |
| **Total experts** | 256 |
| **Active experts per token** | 8 |
| **Source precision** | FP8 (`float8_e4m3fn`) |
## Available Quantizations
| Quantization | Size | Description |
|-------------|------|-------------|
| Q8_0 | 227 GB | 8-bit quantization, highest quality |
| Q4_K_M | 129 GB | 4-bit K-quant (medium), good balance of quality and size |
| IQ3_S | 92 GB | 3-bit importance quantization (small), compact |
| Q2_K | 78 GB | 2-bit K-quant, smallest size |
## Usage
These GGUFs can be used with [llama.cpp](https://github.com/ggerganov/llama.cpp) and compatible frontends.
```bash
# Example with llama-cli
llama-cli -m MiniMax-M2.5-Q4_K_M.gguf -p "Hello" -n 128
```
## Notes
- The source model uses FP8 (`float8_e4m3fn`) precision, so Q8_0 is effectively lossless relative to the source weights.
- This is a large MoE model. Even the smallest quant (Q2_K) requires ~78GB due to the number of experts.
- Quantized from the official [MiniMaxAI/MiniMax-M2.5](https://huggingface.co/MiniMaxAI/MiniMax-M2.5) weights.