|
|
--- |
|
|
license: other |
|
|
base_model: MiniMaxAI/MiniMax-M2.5 |
|
|
tags: |
|
|
- gguf |
|
|
- llama.cpp |
|
|
- quantized |
|
|
- moe |
|
|
--- |
|
|
|
|
|
# MiniMax-M2.5 GGUF |
|
|
|
|
|
GGUF quantizations of [MiniMaxAI/MiniMax-M2.5](https://huggingface.co/MiniMaxAI/MiniMax-M2.5), created with [llama.cpp](https://github.com/ggerganov/llama.cpp). |
|
|
|
|
|
## Model Details |
|
|
|
|
|
| Property | Value | |
|
|
|----------|-------| |
|
|
| **Base model** | MiniMaxAI/MiniMax-M2.5 | |
|
|
| **Architecture** | Mixture of Experts (MoE) | |
|
|
| **Total parameters** | 230B | |
|
|
| **Active parameters** | 10B per token | |
|
|
| **Layers** | 62 | |
|
|
| **Total experts** | 256 | |
|
|
| **Active experts per token** | 8 | |
|
|
| **Source precision** | FP8 (`float8_e4m3fn`) | |
|
|
|
|
|
## Available Quantizations |
|
|
|
|
|
| Quantization | Size | Description | |
|
|
|-------------|------|-------------| |
|
|
| Q8_0 | 227 GB | 8-bit quantization, highest quality | |
|
|
| Q4_K_M | 129 GB | 4-bit K-quant (medium), good balance of quality and size | |
|
|
| IQ3_S | 92 GB | 3-bit importance quantization (small), compact | |
|
|
| Q2_K | 78 GB | 2-bit K-quant, smallest size | |
|
|
|
|
|
## Usage |
|
|
|
|
|
These GGUFs can be used with [llama.cpp](https://github.com/ggerganov/llama.cpp) and compatible frontends. |
|
|
|
|
|
```bash |
|
|
# Example with llama-cli |
|
|
llama-cli -m MiniMax-M2.5-Q4_K_M.gguf -p "Hello" -n 128 |
|
|
``` |
|
|
|
|
|
## Notes |
|
|
|
|
|
- The source model uses FP8 (`float8_e4m3fn`) precision, so Q8_0 is effectively lossless relative to the source weights. |
|
|
- This is a large MoE model. Even the smallest quant (Q2_K) requires ~78GB due to the number of experts. |
|
|
- Quantized from the official [MiniMaxAI/MiniMax-M2.5](https://huggingface.co/MiniMaxAI/MiniMax-M2.5) weights. |
|
|
|