baa.ai

MiniMax-M2.5-SWAN-4bit-MLX

Mixed-precision quantized version of MiniMaxAI/MiniMax-M2.5 using SWAN.

SWAN beats uniform 4-bit: -1.9% PPL, -1.7% size.

Metrics

Metric Value
Size 118 GB
Average bits 3.77
Framework MLX
WikiText-2 PPL 8.787 (mean)
Uniform 4-bit PPL 8.957

Rate-Distortion Curve

Rate-Distortion Curve

Quality vs size trade-off from MINT MCKP allocator. = optimal knee point.

Budget Size Avg Bits Loss
95 GB 95.0 GB 3.1 155.2783
114 GB 114.2 GB 4.0 22.3760
133 GB 133.4 GB 4.5 14.9543
153 GB 152.6 GB 5.2 10.2674
172 GB 171.8 GB 6.0 5.5744
191 GB 191.1 GB 6.8 3.1894
210 GB 210.3 GB 7.6 1.9911
230 GB 229.5 GB 8.1 1.0657
249 GB 247.8 GB 8.9 0.9171
268 GB 267.8 GB 9.7 0.8132
287 GB 286.8 GB 10.4 0.7152
306 GB 305.8 GB 11.2 0.6174
326 GB 324.7 GB 11.9 0.5197
345 GB 343.7 GB 12.7 0.4221
364 GB 363.8 GB 13.5 0.3191
383 GB 382.8 GB 14.3 0.2216
402 GB 401.7 GB 15.0 0.1242
422 GB 420.7 GB 15.8 0.0269
441 GB 426.0 GB 16.0 0.0000
460 GB 426.0 GB 16.0 0.0000

Generated by MINT rate-distortion optimization.

Usage

from mlx_lm import load, generate

model, tokenizer = load("baa-ai/MiniMax-M2.5-SWAN-4bit-MLX")
response = generate(model, tokenizer, prompt="Hello!", max_tokens=256)
print(response)

About SWAN

SWAN uses data-free per-tensor sensitivity analysis with composite scoring to allocate bit-widths across model layers.


Quantized by baa.ai

Downloads last month
891
Safetensors
Model size
229B params
Tensor type
BF16
·
F32
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for baa-ai/MiniMax-M2.5-SWAN-4bit-MLX

Quantized
(70)
this model

Space using baa-ai/MiniMax-M2.5-SWAN-4bit-MLX 1

Collection including baa-ai/MiniMax-M2.5-SWAN-4bit-MLX