MLX Studio — native JANG support with reasoning
MiniMax M2.5 (227B-A21B) — JANG_3L (3.08-bit) — Reasoning
JANG — Jang Adaptive N-bit Grading | The GGUF Equivalent for MLX
JANG is fully open-source. Quantization engine and full commit history: github.com/jjang-ai/jangq. Created by Jinho Jang.
Supported apps: MLX Studio (full native support) and oMLX (PR #364). LM Studio, Ollama, and Inferencer do not yet support JANG.
Why JANG models?
Tools like mlx-lm, oMLX (oQ), and others can quantize models — but shipping a tool is the easy part. JANG models come from hundreds of hours of per-architecture testing: finding which layers break at which bit depths, which MoE routing survives quantization, which models need bfloat16 to avoid NaN. We don't just quantize — we convert, verify, benchmark, and publish every model with tested scores. No other project in the MLX ecosystem publishes pre-tested quantized models at this scale.
Results: 93.5% MMLU (200 Questions, Smart Two-Pass)
| Subject | Score |
|---|---|
| Abstract Algebra | 11/20 (55%) |
| Anatomy | 19/20 (95%) |
| Astronomy | 20/20 (100%) |
| College CS | 20/20 (100%) |
| College Physics | 19/20 (95%) |
| HS Biology | 20/20 (100%) |
| HS Chemistry | 19/20 (95%) |
| HS Mathematics | 20/20 (100%) |
| Logical Fallacies | 19/20 (95%) |
| World Religions | 20/20 (100%) |
| Total | 187/200 (93.5%) |
Pass 1 (no-thinking): 158/200 (79.0%) | Pass 2 (reasoning retry): +29 recovered
JANG vs MLX — MiniMax M2.5
| Model | MMLU | Size | Speed | Notes |
|---|---|---|---|---|
| JANG_3L (this model) | 93.5% | 82 GB | 41 tok/s | 5 subjects at 100% |
| JANG_2L | 74.0% | 63 GB | 48 tok/s | Smallest working MiniMax |
| MLX 4-bit | 26.5% | 91 GB | ~50 tok/s | Broken — random answers |
| MLX 3-bit | 24.5% | 69 GB | — | Broken — random answers |
| MLX 2-bit | 25.0% | 46 GB | — | Broken — random answers |
MLX is broken on MiniMax at ALL bit levels (~25% = random chance). JANG is the ONLY working quantization for MiniMax M2.5 on Apple Silicon.
Key Features
- 93.5% MMLU — five subjects at 100%
- 41 tok/s generation on M3 Ultra
- 82 GB on disk — fits 96+ GB Macs
- 227B total / 21B active — 256 MoE experts, top-8 routing
- Reasoning mode:
<think>...</think>step-by-step reasoning - Sigmoid + bias routing: MiniMax-specific MoE (not softmax)
- FP8 source with block-wise 128x128 scales
Important Notes
- Temperature must be 1.0 — greedy decoding (temp=0) causes infinite thinking loops on MiniMax
- Tokenizer: Known-good tokenizer included (mlx_lm.convert corrupts MiniMax tokenizer)
Architecture
227B total parameters, 21B active per token
- 64 layers, all MoE (256 experts, top-8 routing)
- Sigmoid + bias expert routing (non-normalized)
- GQA attention: 48 heads, 8 KV heads
- FP8 E4M3 source with block-wise scales
Install
pip install jang[mlx]
Created by Jinho Jang — jangq.ai — @dealignai
- Downloads last month
- 197
Quantized
Model tree for JANGQ-AI/MiniMax-M2.5-JANG_3L
Base model
MiniMaxAI/MiniMax-M2.5