Instructions to use JANGQ-AI/MiniMax-M2.5-JANG_3L with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use JANGQ-AI/MiniMax-M2.5-JANG_3L with MLX:

# Make sure mlx-lm is installed
# pip install --upgrade mlx-lm

# Generate text with mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("JANGQ-AI/MiniMax-M2.5-JANG_3L")

prompt = "Write a story about Einstein"
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True
)

text = generate(model, tokenizer, prompt=prompt, verbose=True)

Notebooks
Google Colab
Kaggle
Local Apps Settings
LM Studio

How to use JANGQ-AI/MiniMax-M2.5-JANG_3L with Pi:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "JANGQ-AI/MiniMax-M2.5-JANG_3L"

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "mlx-lm": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "JANGQ-AI/MiniMax-M2.5-JANG_3L"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use JANGQ-AI/MiniMax-M2.5-JANG_3L with Hermes Agent:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "JANGQ-AI/MiniMax-M2.5-JANG_3L"

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default JANGQ-AI/MiniMax-M2.5-JANG_3L

Run Hermes

hermes

OpenClaw new

How to use JANGQ-AI/MiniMax-M2.5-JANG_3L with OpenClaw:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "JANGQ-AI/MiniMax-M2.5-JANG_3L"

Configure OpenClaw

# Install OpenClaw:
npm install -g openclaw@latest
# Register the local server and set it as the default model:
openclaw onboard --non-interactive --mode local \
  --auth-choice custom-api-key \
  --custom-base-url http://127.0.0.1:8080/v1 \
  --custom-model-id "JANGQ-AI/MiniMax-M2.5-JANG_3L" \
  --custom-provider-id mlx-lm \
  --custom-compatibility openai \
  --custom-text-input \
  --accept-risk \
  --skip-health

Run OpenClaw

openclaw agent --local --agent main --message "Hello from Hugging Face"

MLX LM

How to use JANGQ-AI/MiniMax-M2.5-JANG_3L with MLX LM:

Generate or start a chat session

# Install MLX LM
uv tool install mlx-lm
# Interactive chat REPL
mlx_lm.chat --model "JANGQ-AI/MiniMax-M2.5-JANG_3L"

Run an OpenAI-compatible server

# Install MLX LM
uv tool install mlx-lm
# Start the server
mlx_lm.server --model "JANGQ-AI/MiniMax-M2.5-JANG_3L"
# Calling the OpenAI-compatible server with curl
curl -X POST "http://localhost:8000/v1/chat/completions" \
   -H "Content-Type: application/json" \
   --data '{
     "model": "JANGQ-AI/MiniMax-M2.5-JANG_3L",
     "messages": [
       {"role": "user", "content": "Hello"}
     ]
   }'

MLX Studio — native JANG support with reasoning

MiniMax M2.5 (227B-A21B) — JANG_3L (3.08-bit) — Reasoning

JANG — Jang Adaptive N-bit Grading | The GGUF Equivalent for MLX

JANG is fully open-source. Quantization engine and full commit history: github.com/jjang-ai/jangq. Created by Jinho Jang.

Supported apps: MLX Studio (full native support) and oMLX (PR #364). LM Studio, Ollama, and Inferencer do not yet support JANG.

Why JANG models?

Tools like mlx-lm, oMLX (oQ), and others can quantize models — but shipping a tool is the easy part. JANG models come from hundreds of hours of per-architecture testing: finding which layers break at which bit depths, which MoE routing survives quantization, which models need bfloat16 to avoid NaN. We don't just quantize — we convert, verify, benchmark, and publish every model with tested scores. No other project in the MLX ecosystem publishes pre-tested quantized models at this scale.

Results: 93.5% MMLU (200 Questions, Smart Two-Pass)

Subject	Score
Abstract Algebra	11/20 (55%)
Anatomy	19/20 (95%)
Astronomy	20/20 (100%)
College CS	20/20 (100%)
College Physics	19/20 (95%)
HS Biology	20/20 (100%)
HS Chemistry	19/20 (95%)
HS Mathematics	20/20 (100%)
Logical Fallacies	19/20 (95%)
World Religions	20/20 (100%)
Total	187/200 (93.5%)

Pass 1 (no-thinking): 158/200 (79.0%) | Pass 2 (reasoning retry): +29 recovered

JANG vs MLX — MiniMax M2.5

Model	MMLU	Size	Speed	Notes
JANG_3L (this model)	93.5%	82 GB	41 tok/s	5 subjects at 100%
JANG_2L	74.0%	63 GB	48 tok/s	Smallest working MiniMax
MLX 4-bit	26.5%	91 GB	~50 tok/s	Broken — random answers
MLX 3-bit	24.5%	69 GB	—	Broken — random answers
MLX 2-bit	25.0%	46 GB	—	Broken — random answers

MLX is broken on MiniMax at ALL bit levels (~25% = random chance). JANG is the ONLY working quantization for MiniMax M2.5 on Apple Silicon.

Key Features

93.5% MMLU — five subjects at 100%
41 tok/s generation on M3 Ultra
82 GB on disk — fits 96+ GB Macs
227B total / 21B active — 256 MoE experts, top-8 routing
Reasoning mode: <think>...</think> step-by-step reasoning
Sigmoid + bias routing: MiniMax-specific MoE (not softmax)
FP8 source with block-wise 128x128 scales

Important Notes

Temperature must be 1.0 — greedy decoding (temp=0) causes infinite thinking loops on MiniMax
Tokenizer: Known-good tokenizer included (mlx_lm.convert corrupts MiniMax tokenizer)

Architecture

227B total parameters, 21B active per token
- 64 layers, all MoE (256 experts, top-8 routing)
- Sigmoid + bias expert routing (non-normalized)
- GQA attention: 48 heads, 8 KV heads
- FP8 E4M3 source with block-wise scales

Install

pip install jang[mlx]

Created by Jinho Jang — jangq.ai — @dealignai

Downloads last month: 43

Safetensors

Model size

26B params

Tensor type

U32

F16

MLX

Hardware compatibility

Quantized

Model tree for JANGQ-AI/MiniMax-M2.5-JANG_3L

Base model

MiniMaxAI/MiniMax-M2.5

Finetuned

(27)

this model