MiniMax-M2.5 — SWAN Mixed-Precision (4-bit avg)
This is MiniMax-M2.5 quantized using SWAN (Statistical Weight Analysis for N-bit allocation) — a data-free per-tensor mixed-precision quantization method for MLX on Apple Silicon.
Key Features
- Data-free quantization: No calibration dataset required — uses weight statistics only
- Per-tensor bit allocation: Each tensor gets 2, 4, 8, or 16-bit based on sensitivity analysis
- MLX native: Ready for inference on Apple Silicon via
mlx_lm
Results
| Metric | SWAN (this model) | Uniform 4-bit | SWAN vs Uniform |
|---|---|---|---|
| PPL (WikiText-2, mean) | 8.787 | 8.957 | -1.9% |
| PPL (WikiText-2, median) | 9.169 | 9.399 | -2.4% |
| PPL (WikiText-2, trimmed) | 8.748 | 8.924 | -2.0% |
| Model size | 118 GB | 120 GB | -1.7% |
| Peak memory | 121 GB | 123 GB | -1.6% |
Evaluation config: WikiText-2 test split, sequence length 2048, 256 samples, seed 42.
Usage
pip install mlx-lm
# Generate text
python -m mlx_lm.generate \
--model baa-ai/MiniMax-M2.5-SWAN-4bit \
--prompt "Hello, how are you?"
# Interactive chat
python -m mlx_lm.chat --model baa-ai/MiniMax-M2.5-SWAN-4bit
Quantization Details
| Setting | Value |
|---|---|
| Source model | MiniMaxAI/MiniMax-M2.5 (FP8, 229B params, 10B active) |
| Method | SWAN v4 (adaptive normalization + optimized thresholds) |
| Normalization | Adaptive (percentile-based, optimal for MoE) |
| Thresholds | t2=0.20, t8=0.75, t16=0.95 (grid-search optimized) |
| Average bits | 3.77 bpw |
Bit Distribution
| Precision | Parameters | Percentage |
|---|---|---|
| 2-bit (group_size=32) | 90.7B | 39.7% |
| 4-bit (group_size=128) | 115.6B | 50.5% |
| 8-bit (kept at FP8) | 17.6B | 7.7% |
| 16-bit (protected) | 4.8B | 2.1% |
Hardware Requirements
- Apple Silicon with at least 128 GB unified memory (192 GB recommended)
- Peak memory during inference: ~121 GB
About SWAN
SWAN computes four sensitivity metrics per tensor:
- SVD spectral concentration
- Excess kurtosis
- Output noise amplification
- Reconstruction error proxy (NRMSE)
These are combined into a composite score that drives automatic bit-width allocation — without any calibration data.
Paper: SWAN: Data-Free Mixed-Precision Quantization for LLMs via Multi-Metric Sensitivity Analysis (Black Sheep AI Research, 2026)
- Downloads last month
- 233
Model size
229B params
Tensor type
BF16
·
F32 ·
U32 ·
Hardware compatibility
Log In to add your hardware
4-bit
Model tree for baa-ai/MiniMax-M2.5-SWAN-4bit
Base model
MiniMaxAI/MiniMax-M2.5