shieldstackllc's picture
Update 4-bit link (now available)
cf2e746 verified
metadata
language:
  - en
license: mit
pipeline_tag: text-generation
tags:
  - mlx
  - mixture-of-experts
  - moe
  - pruning
  - reap
  - minimax
  - 8bit
  - quantized
  - apple-silicon
library_name: mlx
base_model: Akicou/MiniMax-M2-5-REAP-19

vMLX

MiniMax-M2.5 REAP-19 — MLX 8-bit

MLX 8-bit quantized version of Akicou/MiniMax-M2-5-REAP-19 for efficient local inference on Apple Silicon.

  • Quantization: 8-bit (9.0 bits per weight, group size 64, affine mode)
  • Architecture: MiniMax M2.5 MoE — 62 layers, 205 experts (REAP-pruned from 256), 8 active per token
  • Context: 196K tokens
  • Size: ~193 GB
  • Pruning: 19% of experts removed via REAP (Router Expert Activation Pruning)

Usage

from mlx_lm import load, generate

model, tokenizer = load("shieldstackllc/MiniMax-M2-5-REAP-19-mlx-8bit")
response = generate(model, tokenizer, prompt="Hello!", verbose=True)

Or with vMLX for native macOS inference.

About

MiniMax-M2.5 is a large Mixture-of-Experts language model by MiniMax AI. This variant was pruned to 19% fewer experts by Akicou using REAP (Router Expert Activation Pruning), reducing model size and memory footprint while maintaining strong performance. MLX quantization by vMLX.

Also Available

Made for vMLX

This model was converted and optimized for vMLX — a free, open source macOS native MLX inference engine for Apple Silicon. Download vMLX to run this model locally with zero configuration.

Credits

Contact

For questions, issues, or collaboration: admin@vmlx.net