mudler's picture
Re-quantization in progress
aaee156 verified
metadata
license: other
base_model: MiniMaxAI/MiniMax-M2.7
tags:
  - gguf
  - quantized
  - apex
  - moe
  - mixture-of-experts
  - minimax

MiniMax-M2.7 APEX GGUF

APEX (Adaptive Precision for EXpert Models) quantizations of MiniMax-M2.7.

Brought to you by the LocalAI team | APEX Project | Technical Report

Status: Re-quantization in progress. The previous quants had a conversion bug (our direct FP8→BF16 path produced broken logits). We've identified the issue — using unsloth's pre-converted BF16 GGUF as the source instead — and are re-quantizing. Working quants will be back shortly.

About APEX

APEX is a quantization strategy for Mixture-of-Experts (MoE) models. It classifies tensors by role (routed expert, shared expert, attention) and applies a layer-wise precision gradient — edge layers get higher precision, middle layers get more aggressive compression. I-variants use diverse imatrix calibration.

See the APEX project for full details, technical report, and scripts.

Architecture

  • Model: MiniMax-M2.7 (MiniMaxM2)
  • Layers: 62
  • Experts: 256 routed (8 active per token)
  • Total Parameters: ~228B
  • Active Parameters: ~10B per token

Credits

APEX is brought to you by the LocalAI team.