--- license: other base_model: MiniMaxAI/MiniMax-M2.5 tags: - gguf - llama.cpp - quantized - moe --- # MiniMax-M2.5 GGUF GGUF quantizations of [MiniMaxAI/MiniMax-M2.5](https://huggingface.co/MiniMaxAI/MiniMax-M2.5), created with [llama.cpp](https://github.com/ggerganov/llama.cpp). ## Model Details | Property | Value | |----------|-------| | **Base model** | MiniMaxAI/MiniMax-M2.5 | | **Architecture** | Mixture of Experts (MoE) | | **Total parameters** | 230B | | **Active parameters** | 10B per token | | **Layers** | 62 | | **Total experts** | 256 | | **Active experts per token** | 8 | | **Source precision** | FP8 (`float8_e4m3fn`) | ## Available Quantizations | Quantization | Size | Description | |-------------|------|-------------| | Q8_0 | 227 GB | 8-bit quantization, highest quality | | Q4_K_M | 129 GB | 4-bit K-quant (medium), good balance of quality and size | | IQ3_S | 92 GB | 3-bit importance quantization (small), compact | | Q2_K | 78 GB | 2-bit K-quant, smallest size | ## Usage These GGUFs can be used with [llama.cpp](https://github.com/ggerganov/llama.cpp) and compatible frontends. ```bash # Example with llama-cli llama-cli -m MiniMax-M2.5-Q4_K_M.gguf -p "Hello" -n 128 ``` ## Notes - The source model uses FP8 (`float8_e4m3fn`) precision, so Q8_0 is effectively lossless relative to the source weights. - This is a large MoE model. Even the smallest quant (Q2_K) requires ~78GB due to the number of experts. - Quantized from the official [MiniMaxAI/MiniMax-M2.5](https://huggingface.co/MiniMaxAI/MiniMax-M2.5) weights.