MiniMax-M2.5 GGUF

GGUF quantization of MiniMaxAI/MiniMax-M2.5, created with llama.cpp.

Model Details

Property Value
Base model MiniMaxAI/MiniMax-M2.5
Architecture Mixture of Experts (MoE)
Total parameters 230B
Active parameters 10B per token
Layers 62
Total experts 256
Active experts per token 8
Source precision FP8 (float8_e4m3fn)

Available Quantizations

Quantization Size Description
Q6_K 175 GB 6-bit K-quant, strong quality/size balance

Usage

These GGUFs can be used with llama.cpp and compatible frontends.

# Example with llama-cli
llama-cli -m MiniMax-M2.5.Q6_K.gguf -p "Hello" -n 128

Notes

  • The source model uses FP8 (float8_e4m3fn) precision.
  • This is a large MoE model and requires significant memory.
  • Quantized from the official MiniMaxAI/MiniMax-M2.5 weights.
Downloads last month
-
GGUF
Model size
229B params
Architecture
minimax-m2
Hardware compatibility
Log In to add your hardware

6-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for thad0ctor/MiniMax-M2.5-Q6_K-GGUF

Quantized
(23)
this model