How to use from
OpenClaw
Start the MLX server
# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "OsaurusAI/MiniMax-M3-Coder-Small"
Configure OpenClaw
# Install OpenClaw:
npm install -g openclaw@latest
# Register the local server and set it as the default model:
openclaw onboard --non-interactive --mode local \
  --auth-choice custom-api-key \
  --custom-base-url http://127.0.0.1:8080/v1 \
  --custom-model-id "OsaurusAI/MiniMax-M3-Coder-Small" \
  --custom-provider-id mlx-lm \
  --custom-compatibility openai \
  --custom-text-input \
  --accept-risk \
  --skip-health
Run OpenClaw
openclaw agent --local --agent main --message "Hello from Hugging Face"
Quick Links

Osaurus

MiniMax-M3-Coder-Small

🦖 Osaurus Exclusive — a compact JANG-quantized MiniMax-M3 coder (coding · agentic · multimodal) for Apple Silicon.

⚠️ JANG-format model — runs on Osaurus. This uses the JANG quantization format (mixed-precision affine + AWQ + REAP expert pruning) and loads through Osaurus's native Swift runtime. It will NOT load with transformers, vLLM, or generic MLX loaders.

What is a JANG model?

JANG is a mixed-precision quantization + packing format — per-projection affine bit widths + AWQ activation-aware scaling + REAP expert pruning — described by a jang_config.json. Weights stay quantized in GPU memory. Osaurus loads it through its native Swift JANG runtime on Apple Silicon.

Highlights

  • Smallest M3 coder — ~84 GB (the compact Osaurus build).
  • REAP45: keep 70/128 routed experts (45% pruned).
  • All-2-bit routed experts + AWQ (gate/up 2-bit AWQ-scaled, down 2-bit); attention 8-bit, shared experts 6-bit, embeddings 6-bit, lm_head 8-bit, Lightning Indexer FP16.
  • Multimodal (vision) kept.
  • Calibration: Vera (agentic-coder) + GSM8K; "floor" recipe keeps the most-salient coding experts.

Benchmarks

  • HumanEval: pass@1 = 100% (82/82, scrambled-half adaptive eval, seed 42; 0 failures, 0 escalations).
  • Despite 45% expert pruning + all-2-bit routed experts, coding accuracy holds at 100% — the REAP45 keep-set is a subset of the larger M3-Coder builds' proven coding experts, so coding capability is preserved while the model shrinks to ~84 GB.

Run it

Load it in Osaurus (Apple Silicon) — it runs on Osaurus's native Swift JANG runtime.

Attribution

  • Base model: MiniMaxAI/MiniMax-M3 · Pruning: REAP (Cerebras, arXiv:2510.13999)
  • Vera calibration + testing: @hornsman1 (hornsan1 on GitHub) · math calibration: GSM8K
  • Quantization: JANG · Runtime & distribution: Osaurus
Downloads last month
874
Safetensors
Model size
25B params
Tensor type
U32
·
F16
·
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for OsaurusAI/MiniMax-M3-Coder-Small

Finetuned
(10)
this model

Paper for OsaurusAI/MiniMax-M3-Coder-Small