marksverdhei
/

MiniMax-M2.5-GGUF

Mixture of Experts

Model card Files Files and versions

MiniMax-M2.5-GGUF / README.md

marksverdhei's picture

Upload README.md with huggingface_hub

5ba5300 verified 6 days ago

|

history blame contribute delete

1.57 kB

	---
	license: other
	base_model: MiniMaxAI/MiniMax-M2.5
	tags:
	- gguf
	- llama.cpp
	- quantized
	- moe
	---

	# MiniMax-M2.5 GGUF

	GGUF quantizations of [MiniMaxAI/MiniMax-M2.5](https://huggingface.co/MiniMaxAI/MiniMax-M2.5), created with [llama.cpp](https://github.com/ggerganov/llama.cpp).

	## Model Details

	\| Property \| Value \|
	\|----------\|-------\|
	\| Base model \| MiniMaxAI/MiniMax-M2.5 \|
	\| Architecture \| Mixture of Experts (MoE) \|
	\| Total parameters \| 230B \|
	\| Active parameters \| 10B per token \|
	\| Layers \| 62 \|
	\| Total experts \| 256 \|
	\| Active experts per token \| 8 \|
	\| Source precision \| FP8 (`float8_e4m3fn`) \|

	## Available Quantizations

	\| Quantization \| Size \| Description \|
	\|-------------\|------\|-------------\|
	\| Q8_0 \| 227 GB \| 8-bit quantization, highest quality \|
	\| Q4_K_M \| 129 GB \| 4-bit K-quant (medium), good balance of quality and size \|
	\| IQ3_S \| 92 GB \| 3-bit importance quantization (small), compact \|
	\| Q2_K \| 78 GB \| 2-bit K-quant, smallest size \|

	## Usage

	These GGUFs can be used with [llama.cpp](https://github.com/ggerganov/llama.cpp) and compatible frontends.

	```bash
	# Example with llama-cli
	llama-cli -m MiniMax-M2.5-Q4_K_M.gguf -p "Hello" -n 128
	```

	## Notes

	- The source model uses FP8 (`float8_e4m3fn`) precision, so Q8_0 is effectively lossless relative to the source weights.
	- This is a large MoE model. Even the smallest quant (Q2_K) requires ~78GB due to the number of experts.
	- Quantized from the official [MiniMaxAI/MiniMax-M2.5](https://huggingface.co/MiniMaxAI/MiniMax-M2.5) weights.