Instructions to use JANGQ-AI/MiniMax-M2.5-JANG_3L with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use JANGQ-AI/MiniMax-M2.5-JANG_3L with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("JANGQ-AI/MiniMax-M2.5-JANG_3L") prompt = "Write a story about Einstein" messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template( messages, add_generation_prompt=True ) text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
- Pi new
How to use JANGQ-AI/MiniMax-M2.5-JANG_3L with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "JANGQ-AI/MiniMax-M2.5-JANG_3L"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "JANGQ-AI/MiniMax-M2.5-JANG_3L" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use JANGQ-AI/MiniMax-M2.5-JANG_3L with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "JANGQ-AI/MiniMax-M2.5-JANG_3L"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default JANGQ-AI/MiniMax-M2.5-JANG_3L
Run Hermes
hermes
- MLX LM
How to use JANGQ-AI/MiniMax-M2.5-JANG_3L with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Interactive chat REPL mlx_lm.chat --model "JANGQ-AI/MiniMax-M2.5-JANG_3L"
Run an OpenAI-compatible server
# Install MLX LM uv tool install mlx-lm # Start the server mlx_lm.server --model "JANGQ-AI/MiniMax-M2.5-JANG_3L" # Calling the OpenAI-compatible server with curl curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "JANGQ-AI/MiniMax-M2.5-JANG_3L", "messages": [ {"role": "user", "content": "Hello"} ] }'
MLX Studio — native JANG support with reasoning
MiniMax M2.5 (227B-A21B) — JANG_3L (3.08-bit) — Reasoning
JANG — Jang Adaptive N-bit Grading | The GGUF Equivalent for MLX
JANG is fully open-source. Quantization engine and full commit history: github.com/jjang-ai/jangq. Created by Jinho Jang.
Supported apps: MLX Studio (full native support) and oMLX (PR #364). LM Studio, Ollama, and Inferencer do not yet support JANG.
Why JANG models?
Tools like mlx-lm, oMLX (oQ), and others can quantize models — but shipping a tool is the easy part. JANG models come from hundreds of hours of per-architecture testing: finding which layers break at which bit depths, which MoE routing survives quantization, which models need bfloat16 to avoid NaN. We don't just quantize — we convert, verify, benchmark, and publish every model with tested scores. No other project in the MLX ecosystem publishes pre-tested quantized models at this scale.
Results: 93.5% MMLU (200 Questions, Smart Two-Pass)
| Subject | Score |
|---|---|
| Abstract Algebra | 11/20 (55%) |
| Anatomy | 19/20 (95%) |
| Astronomy | 20/20 (100%) |
| College CS | 20/20 (100%) |
| College Physics | 19/20 (95%) |
| HS Biology | 20/20 (100%) |
| HS Chemistry | 19/20 (95%) |
| HS Mathematics | 20/20 (100%) |
| Logical Fallacies | 19/20 (95%) |
| World Religions | 20/20 (100%) |
| Total | 187/200 (93.5%) |
Pass 1 (no-thinking): 158/200 (79.0%) | Pass 2 (reasoning retry): +29 recovered
JANG vs MLX — MiniMax M2.5
| Model | MMLU | Size | Speed | Notes |
|---|---|---|---|---|
| JANG_3L (this model) | 93.5% | 82 GB | 41 tok/s | 5 subjects at 100% |
| JANG_2L | 74.0% | 63 GB | 48 tok/s | Smallest working MiniMax |
| MLX 4-bit | 26.5% | 91 GB | ~50 tok/s | Broken — random answers |
| MLX 3-bit | 24.5% | 69 GB | — | Broken — random answers |
| MLX 2-bit | 25.0% | 46 GB | — | Broken — random answers |
MLX is broken on MiniMax at ALL bit levels (~25% = random chance). JANG is the ONLY working quantization for MiniMax M2.5 on Apple Silicon.
Key Features
- 93.5% MMLU — five subjects at 100%
- 41 tok/s generation on M3 Ultra
- 82 GB on disk — fits 96+ GB Macs
- 227B total / 21B active — 256 MoE experts, top-8 routing
- Reasoning mode:
<think>...</think>step-by-step reasoning - Sigmoid + bias routing: MiniMax-specific MoE (not softmax)
- FP8 source with block-wise 128x128 scales
Important Notes
- Temperature must be 1.0 — greedy decoding (temp=0) causes infinite thinking loops on MiniMax
- Tokenizer: Known-good tokenizer included (mlx_lm.convert corrupts MiniMax tokenizer)
Architecture
227B total parameters, 21B active per token
- 64 layers, all MoE (256 experts, top-8 routing)
- Sigmoid + bias expert routing (non-normalized)
- GQA attention: 48 heads, 8 KV heads
- FP8 E4M3 source with block-wise scales
Install
pip install jang[mlx]
Created by Jinho Jang — jangq.ai — @dealignai
- Downloads last month
- 394
Quantized
Model tree for JANGQ-AI/MiniMax-M2.5-JANG_3L
Base model
MiniMaxAI/MiniMax-M2.5