MLX Studio

MLX Studio — native JANG support with reasoning


JANG

MiniMax M2.5 (227B-A21B) — JANG_3L (3.08-bit) — Reasoning

JANG — Jang Adaptive N-bit Grading | The GGUF Equivalent for MLX

GitHub  PyPI  Website  X/Twitter

JANG is fully open-source. Quantization engine and full commit history: github.com/jjang-ai/jangq. Created by Jinho Jang.


Supported apps: MLX Studio (full native support) and oMLX (PR #364). LM Studio, Ollama, and Inferencer do not yet support JANG.


Why JANG models?

Tools like mlx-lm, oMLX (oQ), and others can quantize models — but shipping a tool is the easy part. JANG models come from hundreds of hours of per-architecture testing: finding which layers break at which bit depths, which MoE routing survives quantization, which models need bfloat16 to avoid NaN. We don't just quantize — we convert, verify, benchmark, and publish every model with tested scores. No other project in the MLX ecosystem publishes pre-tested quantized models at this scale.


Results: 93.5% MMLU (200 Questions, Smart Two-Pass)

Subject Score
Abstract Algebra 11/20 (55%)
Anatomy 19/20 (95%)
Astronomy 20/20 (100%)
College CS 20/20 (100%)
College Physics 19/20 (95%)
HS Biology 20/20 (100%)
HS Chemistry 19/20 (95%)
HS Mathematics 20/20 (100%)
Logical Fallacies 19/20 (95%)
World Religions 20/20 (100%)
Total 187/200 (93.5%)

Pass 1 (no-thinking): 158/200 (79.0%) | Pass 2 (reasoning retry): +29 recovered

JANG vs MLX — MiniMax M2.5

Model MMLU Size Speed Notes
JANG_3L (this model) 93.5% 82 GB 41 tok/s 5 subjects at 100%
JANG_2L 74.0% 63 GB 48 tok/s Smallest working MiniMax
MLX 4-bit 26.5% 91 GB ~50 tok/s Broken — random answers
MLX 3-bit 24.5% 69 GB Broken — random answers
MLX 2-bit 25.0% 46 GB Broken — random answers

MLX is broken on MiniMax at ALL bit levels (~25% = random chance). JANG is the ONLY working quantization for MiniMax M2.5 on Apple Silicon.

Key Features

  • 93.5% MMLU — five subjects at 100%
  • 41 tok/s generation on M3 Ultra
  • 82 GB on disk — fits 96+ GB Macs
  • 227B total / 21B active — 256 MoE experts, top-8 routing
  • Reasoning mode: <think>...</think> step-by-step reasoning
  • Sigmoid + bias routing: MiniMax-specific MoE (not softmax)
  • FP8 source with block-wise 128x128 scales

Important Notes

  • Temperature must be 1.0 — greedy decoding (temp=0) causes infinite thinking loops on MiniMax
  • Tokenizer: Known-good tokenizer included (mlx_lm.convert corrupts MiniMax tokenizer)

Architecture

227B total parameters, 21B active per token
- 64 layers, all MoE (256 experts, top-8 routing)
- Sigmoid + bias expert routing (non-normalized)
- GQA attention: 48 heads, 8 KV heads
- FP8 E4M3 source with block-wise scales

Install

pip install jang[mlx]

Created by Jinho Jangjangq.ai@dealignai

Downloads last month
197
Safetensors
Model size
26B params
Tensor type
U32
·
F16
·
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for JANGQ-AI/MiniMax-M2.5-JANG_3L

Quantized
(67)
this model