MiniMax-M2.5-REAP-29-mlx-4bit / README.md

shieldstackllc

Add vMLX model card

c2e0389 verified 2 days ago

preview code

raw

history blame contribute delete

2.42 kB

metadata

language:
  - en
license: mit
pipeline_tag: text-generation
tags:
  - mlx
  - mixture-of-experts
  - moe
  - pruning
  - reap
  - minimax
  - 4bit
  - quantized
  - apple-silicon
library_name: mlx
base_model: Akicou/MiniMax-M2-5-REAP-29

MiniMax-M2.5 REAP-29 — MLX 4-bit

MLX 4-bit quantized version of Akicou/MiniMax-M2-5-REAP-29 for efficient local inference on Apple Silicon.

Quantization: 4-bit (group size 64, affine mode; router gates at 8-bit)
Architecture: MiniMax M2.5 MoE — 62 layers, 180 experts (REAP-pruned from 256), 8 active per token
Context: 196K tokens
Size: ~85 GB
Pruning: 29% of experts removed via REAP (Router Expert Activation Pruning)

Usage

from mlx_lm import load, generate

model, tokenizer = load("shieldstackllc/MiniMax-M2.5-REAP-29-mlx-4bit")
response = generate(model, tokenizer, prompt="Hello!", verbose=True)

Or with vMLX for native macOS inference.

About

MiniMax-M2.5 is a large Mixture-of-Experts language model by MiniMax AI. This variant was pruned to 29% fewer experts by Akicou using REAP (Router Expert Activation Pruning), reducing model size and memory footprint while maintaining strong performance. MLX quantization by vMLX.

Also Available

MiniMax-M2.5-REAP-39 MLX 4-bit (~73 GB) — 39% pruned variant
MiniMax-M2.5-REAP-39 MLX 8-bit (~138 GB) — 39% pruned variant

Made for vMLX

This model was converted and optimized for vMLX — a free, open source macOS native MLX inference engine for Apple Silicon. Download vMLX to run this model locally with zero configuration.

Credits

Base model: MiniMaxAI/MiniMax-M2.5 by MiniMax AI
REAP pruning: Akicou/MiniMax-M2-5-REAP-29 by Akicou
MLX conversion: vMLX — Run AI locally on Mac. No compromises.

Contact

For questions, issues, or collaboration: admin@vmlx.net