license: other
base_model: MiniMaxAI/MiniMax-M2.7
tags:
- gguf
- quantized
- apex
- moe
- mixture-of-experts
- minimax
MiniMax-M2.7 APEX GGUF
APEX (Adaptive Precision for EXpert Models) quantizations of MiniMax-M2.7.
Brought to you by the LocalAI team | APEX Project | Technical Report
Status: Re-quantization in progress. The previous quants had a conversion bug (our direct FP8→BF16 path produced broken logits). We've identified the issue — using unsloth's pre-converted BF16 GGUF as the source instead — and are re-quantizing. Working quants will be back shortly.
About APEX
APEX is a quantization strategy for Mixture-of-Experts (MoE) models. It classifies tensors by role (routed expert, shared expert, attention) and applies a layer-wise precision gradient — edge layers get higher precision, middle layers get more aggressive compression. I-variants use diverse imatrix calibration.
See the APEX project for full details, technical report, and scripts.
Architecture
- Model: MiniMax-M2.7 (MiniMaxM2)
- Layers: 62
- Experts: 256 routed (8 active per token)
- Total Parameters: ~228B
- Active Parameters: ~10B per token
Credits
APEX is brought to you by the LocalAI team.