mudler
/

MiniMax-M2.7-APEX-GGUF

Mixture of Experts

mixture-of-experts

Model card Files Files and versions

MiniMax-M2.7-APEX-GGUF / README.md

mudler's picture

Re-quantization in progress

aaee156 verified 1 day ago

|

history blame contribute delete

1.55 kB

	---
	license: other
	base_model: MiniMaxAI/MiniMax-M2.7
	tags:
	- gguf
	- quantized
	- apex
	- moe
	- mixture-of-experts
	- minimax
	---

	# MiniMax-M2.7 APEX GGUF

	APEX (Adaptive Precision for EXpert Models) quantizations of [MiniMax-M2.7](https://huggingface.co/MiniMaxAI/MiniMax-M2.7).

	Brought to you by the [LocalAI](https://github.com/mudler/LocalAI) team \| [APEX Project](https://github.com/mudler/apex-quant) \| [Technical Report](https://github.com/mudler/apex-quant/blob/main/paper/APEX_Technical_Report.pdf)

	> Status: Re-quantization in progress. The previous quants had a conversion bug (our direct FP8→BF16 path produced broken logits). We've identified the issue — using unsloth's pre-converted BF16 GGUF as the source instead — and are re-quantizing. Working quants will be back shortly.

	## About APEX

	APEX is a quantization strategy for Mixture-of-Experts (MoE) models. It classifies tensors by role (routed expert, shared expert, attention) and applies a layer-wise precision gradient — edge layers get higher precision, middle layers get more aggressive compression. I-variants use diverse imatrix calibration.

	See the [APEX project](https://github.com/mudler/apex-quant) for full details, technical report, and scripts.

	## Architecture

	- Model: MiniMax-M2.7 (MiniMaxM2)
	- Layers: 62
	- Experts: 256 routed (8 active per token)
	- Total Parameters: ~228B
	- Active Parameters: ~10B per token

	## Credits

	APEX is brought to you by the [LocalAI](https://github.com/mudler/LocalAI) team.