MiniMax-M2-5-REAP-19-mlx-8bit / README.md

shieldstackllc

Update 4-bit link (now available)

cf2e746 verified 2 days ago

preview code

raw

history blame contribute delete

2.59 kB

metadata

language:
  - en
license: mit
pipeline_tag: text-generation
tags:
  - mlx
  - mixture-of-experts
  - moe
  - pruning
  - reap
  - minimax
  - 8bit
  - quantized
  - apple-silicon
library_name: mlx
base_model: Akicou/MiniMax-M2-5-REAP-19

MiniMax-M2.5 REAP-19 — MLX 8-bit

MLX 8-bit quantized version of Akicou/MiniMax-M2-5-REAP-19 for efficient local inference on Apple Silicon.

Quantization: 8-bit (9.0 bits per weight, group size 64, affine mode)
Architecture: MiniMax M2.5 MoE — 62 layers, 205 experts (REAP-pruned from 256), 8 active per token
Context: 196K tokens
Size: ~193 GB
Pruning: 19% of experts removed via REAP (Router Expert Activation Pruning)

Usage

from mlx_lm import load, generate

model, tokenizer = load("shieldstackllc/MiniMax-M2-5-REAP-19-mlx-8bit")
response = generate(model, tokenizer, prompt="Hello!", verbose=True)

Or with vMLX for native macOS inference.

About

MiniMax-M2.5 is a large Mixture-of-Experts language model by MiniMax AI. This variant was pruned to 19% fewer experts by Akicou using REAP (Router Expert Activation Pruning), reducing model size and memory footprint while maintaining strong performance. MLX quantization by vMLX.

Also Available

MiniMax-M2.5-REAP-19 MLX 4-bit (~107 GB)
MiniMax-M2.5-REAP-39 MLX 8-bit (~138 GB)
MiniMax-M2.5-REAP-39 MLX 4-bit (~73 GB)
MiniMax-M2.5-REAP-29 MLX 4-bit

Made for vMLX

This model was converted and optimized for vMLX — a free, open source macOS native MLX inference engine for Apple Silicon. Download vMLX to run this model locally with zero configuration.

Credits

Base model: MiniMaxAI/MiniMax-M2.5 by MiniMax AI
REAP pruning: Akicou/MiniMax-M2-5-REAP-19 by Akicou
MLX conversion: vMLX — Run AI locally on Mac. No compromises.

Contact

For questions, issues, or collaboration: admin@vmlx.net