MiniMax-M2.5-GGUF / README.md
marksverdhei's picture
Upload README.md with huggingface_hub
5ba5300 verified
metadata
license: other
base_model: MiniMaxAI/MiniMax-M2.5
tags:
  - gguf
  - llama.cpp
  - quantized
  - moe

MiniMax-M2.5 GGUF

GGUF quantizations of MiniMaxAI/MiniMax-M2.5, created with llama.cpp.

Model Details

Property Value
Base model MiniMaxAI/MiniMax-M2.5
Architecture Mixture of Experts (MoE)
Total parameters 230B
Active parameters 10B per token
Layers 62
Total experts 256
Active experts per token 8
Source precision FP8 (float8_e4m3fn)

Available Quantizations

Quantization Size Description
Q8_0 227 GB 8-bit quantization, highest quality
Q4_K_M 129 GB 4-bit K-quant (medium), good balance of quality and size
IQ3_S 92 GB 3-bit importance quantization (small), compact
Q2_K 78 GB 2-bit K-quant, smallest size

Usage

These GGUFs can be used with llama.cpp and compatible frontends.

# Example with llama-cli
llama-cli -m MiniMax-M2.5-Q4_K_M.gguf -p "Hello" -n 128

Notes

  • The source model uses FP8 (float8_e4m3fn) precision, so Q8_0 is effectively lossless relative to the source weights.
  • This is a large MoE model. Even the smallest quant (Q2_K) requires ~78GB due to the number of experts.
  • Quantized from the official MiniMaxAI/MiniMax-M2.5 weights.