--- license: other base_model: MiniMaxAI/MiniMax-M3 tags: [mlx, vmlx, jang, reap, awq, moe, code, multimodal, minimax-m3, apple-silicon] pipeline_tag: text-generation ---

A JANG-quantized MiniMax-M3 — coding/agentic + multimodal — for the vMLX engine (Apple Silicon / MLX).
> ⚠️ **Requires vMLX engine v1.5.67 or newer.** > This is a **JANG-format** model (JANG affine-mixed + **AWQ** quantization, **REAP** expert pruning, and the > MiniMax-M3 MSA / Lightning-Indexer runtime). It will **NOT** load with `transformers`, `vLLM`, or generic MLX > loaders — it needs vMLX's JANG loader + the M3 runtime. **Coder support lands in vMLX ≥ 1.5.67.** ## What is a JANG model? **JANG** is vMLX's quantization + packing format: mixed-precision **affine** quantization (per-projection bit widths) + **AWQ** activation-aware scaling + **REAP** expert pruning, described by a `jang_config.json`. Weights stay quantized in GPU memory and are loaded by vMLX's JANG loader. Because the format **and** the MiniMax-M3 runtime (MSA dual-cache, Lightning Indexer, partial RoPE, vision tower) are vMLX-specific, **these models run only on vMLX ≥ 1.5.67.** ## Run it 1. Install/update **vMLX 1.5.67+** — https://mlx.studio (or `pip install -U vmlx`). 2. App: **Server → New Session →** pick/download this model **→ Start →** chat. 3. CLI: `vmlx-engine serve JANGQ-AI/MiniMax-M3-REAP22-Coder --reasoning-parser minimax_m3 --tool-call-parser minimax_m3` ## Highlights - **Coding: HumanEval pass@1 = 100%** (81/81 on a scrambled half of HumanEval, first-sample) — pass@5 = 1.000. - **Arithmetic/reasoning recovered** vs the base REAP quant (with reasoning enabled): ~7/7 on a 7-task probe. - **Multimodal (vision) kept.** ~107 GB on disk. ## Build - **Base:** MiniMaxAI/MiniMax-M3 (60 layers, MoE, MSA Lightning Indexer, GQA, partial RoPE). - **REAP pruning:** keep **100/128** routed experts per MoE layer (22% pruned), saliency-scored. - **JANG affine quant (group_size 64):** routed gate/up = **2-bit + AWQ pre-scaling**, down = 2-bit; shared experts 6-bit; attention 8-bit; embeddings 6-bit; lm_head 8-bit; Lightning Indexer + norms FP16; vision 8-bit. - **"Floor" expert recipe:** protect the proven coding experts (coding saliency) + add top math experts, so coding stays intact while math improves. - **Calibration:** Vera (agentic-coder) dominant + GSM8K (math reasoning). ## Attribution - Base model: **MiniMaxAI/MiniMax-M3** - Expert pruning: **REAP** (Cerebras, ICLR 2026, arXiv:2510.13999) - **Vera agentic-coder calibration dataset + evaluation/testing: [@hornsman1](https://huggingface.co/hornsman1) (hornsan1 on GitHub)** - Additional math-reasoning calibration: GSM8K - Quantization & runtime: **JANG / vMLX** ## Credits - Vera dataset & model testing: **@hornsby_andrew** ([hornsan1](https://github.com/hornsan1) on GitHub)