CRITICAL FIX (2026-03-19): Fixed chat template for thinking toggle. Re-download if you experience infinite loops.
Update (2026-03-18): Models have been updated to v2.1.0 with proper tokenizer and fixed configs. If you downloaded before this date, please re-download for full MLX Studio compatibility.
MLX Studio — the only app that natively supports JANG models
Early Adoption: LM Studio, Ollama, oMLX, Inferencer do not support JANG yet. Use MLX Studio or
pip install "jang[mlx]". Ask your favorite app's creators to add JANG support!
MiniMax-M2.5 — JANG_3M (3.07-bit, 8-bit attention)
JANG — Jang Adaptive N-bit Grading | Mixed-Precision Quantization for Apple Silicon
JANG is fully open-source. Quantization engine, research, and full commit history: github.com/jjang-ai/jangq. Created by Jinho Jang.
Results (200-question MMLU)
| Model | MMLU | Size | Speed |
|---|---|---|---|
| JANG_3M (3.07-bit) | 74.5% | 82 GB | 48.3 tok/s |
| JANG_2L (2.10-bit) | 74% | 63 GB | 50.9 tok/s |
| MLX 4-bit | 26.5% | 91 GB | ~50 tok/s |
| MLX 3-bit | 24.5% | — | — |
| MLX 2-bit | 25% | — | — |
JANG is the ONLY working quantization for MiniMax. MLX uniform is broken at ALL bit levels (~25% = random chance).
Per-Subject Scores (200 questions, temp=0.0, thinking ON)
| Subject | Score |
|---|---|
| Abstract Algebra | 10/20 |
| Anatomy | 15/20 |
| Astronomy | 18/20 |
| College CS | 10/20 |
| College Physics | 17/20 |
| HS Biology | 18/20 |
| HS Chemistry | 16/20 |
| HS Mathematics | 12/20 |
| Logical Fallacies | 16/20 |
| World Religions | 17/20 |
| Total | 149/200 = 74.5% |
Specs
| Metric | Value |
|---|---|
| Source | MiniMax-M2.5 |
| Architecture | MoE (256 experts, 8 active), standard attention, 62 layers |
| Profile | JANG_3M (CRITICAL=8, IMPORTANT=3, COMPRESS=3) |
| Average bits | 3.07 |
| GPU Memory | 88.5 GB |
| Disk Size | 82 GB |
| Speed | 48.3 tok/s (M4 Ultra 256 GB) |
| group_size | 128 (experts) / 64 (router) |
| Temperature | 1.0 required (greedy causes loops without thinking) |
| Format | v2 (MLX-native, instant load) |
Important Notes
- Temperature must be 1.0 — greedy decoding (temp=0) causes infinite thinking loops
- top_p=0.95, top_k=40 recommended
- Thinking can be toggled via
enable_thinking=True/Falsein chat template enable_thinking=False+ temp=0.0 works for MMLU benchmarks
Note: MiniMax-M2.5 is a text-only model (no vision encoder). Use
load_jang_model()for inference.
Install
pip install "jang[mlx]"
Quick Start
from jang_tools.loader import load_jang_model
from mlx_lm.sample_utils import make_sampler
from mlx_lm.generate import generate_step
import mlx.core as mx
model, tokenizer = load_jang_model("JANGQ-AI/MiniMax-M2.5-JANG_3M")
sampler = make_sampler(temp=1.0, top_p=0.95)
tokens = tokenizer.encode("What is photosynthesis?")
for tok, _ in generate_step(prompt=mx.array(tokens), model=model, max_tokens=200, sampler=sampler):
t = tok.item() if hasattr(tok, 'item') else int(tok)
print(tokenizer.decode([t]), end="", flush=True)
if t == tokenizer.eos_token_id:
break
Links
- GitHub | HuggingFace | MLX Studio | PyPI | Format Spec
한국어
MiniMax-M2.5 — JANG_3M
JANG은 MiniMax에서 유일하게 작동하는 양자화 포맷입니다. MLX 균일 양자화는 모든 비트 수준에서 깨져 있습니다 (~25% = 무작위).
| 모델 | MMLU | 크기 | 속도 |
|---|---|---|---|
| JANG_3M | 74.5% | 82 GB | 48.3 tok/s |
| JANG_2L | 74% | 63 GB | 50.9 tok/s |
JANG은 MoE 라우터를 8비트로 보호하면서 전문가 MLP를 3비트로 압축합니다.
GitHub · HuggingFace · MLX Studio · PyPI
장진호 제작 · Created by Jinho Jang — jangq.ai · @dealignai
Model tree for JANGQ-AI/MiniMax-M2.5-JANG_3M
Base model
MiniMaxAI/MiniMax-M2.5
