Text Generation
MLX
Safetensors
English
Chinese
minimax_m2
Mixture of Experts
mixture-of-experts
quantized
apple-silicon
turboquant
jangtq
jangtq2
reap
conversational
custom_code
Instructions to use OsaurusAI/MiniMax-M2.7-Small-JANGTQ with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use OsaurusAI/MiniMax-M2.7-Small-JANGTQ with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("OsaurusAI/MiniMax-M2.7-Small-JANGTQ") prompt = "Write a story about Einstein" messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template( messages, add_generation_prompt=True ) text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
- Pi new
How to use OsaurusAI/MiniMax-M2.7-Small-JANGTQ with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "OsaurusAI/MiniMax-M2.7-Small-JANGTQ"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "OsaurusAI/MiniMax-M2.7-Small-JANGTQ" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use OsaurusAI/MiniMax-M2.7-Small-JANGTQ with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "OsaurusAI/MiniMax-M2.7-Small-JANGTQ"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default OsaurusAI/MiniMax-M2.7-Small-JANGTQ
Run Hermes
hermes
- MLX LM
How to use OsaurusAI/MiniMax-M2.7-Small-JANGTQ with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Interactive chat REPL mlx_lm.chat --model "OsaurusAI/MiniMax-M2.7-Small-JANGTQ"
Run an OpenAI-compatible server
# Install MLX LM uv tool install mlx-lm # Start the server mlx_lm.server --model "OsaurusAI/MiniMax-M2.7-Small-JANGTQ" # Calling the OpenAI-compatible server with curl curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "OsaurusAI/MiniMax-M2.7-Small-JANGTQ", "messages": [ {"role": "user", "content": "Hello"} ] }'
docs: prepend REQUIRED warning — jangtq_runtime.safetensors sidecar must be downloaded
e15b2f3 verified | language: | |
| - en | |
| - zh | |
| library_name: mlx | |
| license: other | |
| license_name: modified-mit | |
| pipeline_tag: text-generation | |
| base_model: MiniMaxAI/MiniMax-M2 | |
| tags: | |
| - moe | |
| - mixture-of-experts | |
| - minimax_m2 | |
| - quantized | |
| - apple-silicon | |
| - mlx | |
| - turboquant | |
| - jangtq | |
| - jangtq2 | |
| - reap | |
| > ## ⚠️ REQUIRED — `jangtq_runtime.safetensors` sidecar must be downloaded | |
| > | |
| > Osaurus uses the native Swift JANGTQ runtime. **Every JANGTQ bundle on | |
| > OsaurusAI ships a small `jangtq_runtime.safetensors` sidecar (~10 KB–~165 KB) | |
| > alongside the weight shards.** The Swift loader will refuse to start with | |
| > the error | |
| > ``` | |
| > Error: Model '<name>' declares JANGTQ (weight_format: "mxtq") but is | |
| > missing required sidecar file 'jangtq_runtime.safetensors'. | |
| > Re-download the full model or obtain the sidecar from the original | |
| > publisher. | |
| > ``` | |
| > if the file is absent. | |
| > | |
| > If your local copy doesn't have it (older download, partial sync, etc): | |
| > ```bash | |
| > hf download OsaurusAI/MiniMax-M2.7-Small-JANGTQ jangtq_runtime.safetensors --local-dir <your-dir> | |
| > ``` | |
| > The file holds the deterministic codebooks + Hadamard rotation signs the | |
| > Swift loader uses to decode `*.tq_packed` weights. It must match the seed | |
| > the bundle was quantized with (`mxtq_seed=42`). | |
| <p align="center"> | |
| <a href="https://osaurus.ai"><img src="./osaurus-x-banner.png" alt="Osaurus AI"></a> | |
| </p> | |
| <h3 align="center">MiniMax M2.7 Small — 138B-A10B — JANGTQ (MLX)</h3> | |
| <p align="center"><b>This is now a ~138B-A10B MoE — 38 GB on disk</b> (down from MiniMax M2's ~460 GB / 230B base) — 40% routed-expert prune + 2-bit JANGTQ quantization. Distilled from MiniMax M2 via REAP saliency + JANGTQ2 codebook quantization — routed experts at 2-bit via Lloyd-Max codebooks + Hadamard rotation, attention / embed / lm_head / dense MLP at 8-bit affine, norms and router at 16-bit.</p> | |
| <p align="center"> | |
| <a href="https://osaurus.ai"><img src="https://img.shields.io/badge/Web-osaurus.ai-blue" alt="Website"></a> | |
| <a href="https://huggingface.co/OsaurusAI"><img src="https://img.shields.io/badge/HF-OsaurusAI-yellow?logo=huggingface" alt="OsaurusAI"></a> | |
| <a href="https://huggingface.co/MiniMaxAI/MiniMax-M2"><img src="https://img.shields.io/badge/Base-MiniMax--M2-orange?logo=huggingface" alt="MiniMax M2"></a> | |
| </p> | |
| --- | |
| ## Model Details | |
| Runs on Apple Silicon via the JANG toolchain + MLX. | |
| ``` | |
| MiniMax M2 (base) | |
| ↓ v3 calibration corpus (code · agentic · general · academic · science · CN · cyber · systems · long-context) | |
| ↓ | |
| REAP saliency observer (62 layers × 256 experts → scoring) | |
| ↓ 40% expert prune (154 of 256 kept per layer) | |
| ↓ | |
| JANGTQ2 quantization | |
| • 2-bit MXTQ on routed-expert weights (Hadamard-rotated Lloyd-Max codebook) | |
| • 8-bit affine on attention + dense MLP + embed + lm_head | |
| • 16-bit on norms and router weights | |
| ``` | |
| | | Value | | |
| |---|---| | |
| | Parameters | **~138B total, ~10B active per token** | | |
| | Routed experts kept | 154 of 256 (60%) | | |
| | Top-k active experts | 8 per token | | |
| | Layers | 62 | | |
| | Bundle size | 38 GB | | |
| | Dtype | bfloat16 activations | | |
| | Attention | Standard Q/K/V + GQA 6:1, head_dim=128, rope_theta=5M | | |
| | Context | 196,608 | | |
| ## Use | |
| ```python | |
| from jang_tools.load_jangtq import load_jangtq_model | |
| from mlx_lm import generate | |
| from mlx_lm.sample_utils import make_sampler | |
| model, tokenizer = load_jangtq_model("OsaurusAI/MiniMax-M2.7-Small-JANGTQ") | |
| messages = [{"role": "user", "content": "Write a Python function that…"}] | |
| prompt = tokenizer.apply_chat_template( | |
| messages, add_generation_prompt=True, tokenize=False | |
| ) | |
| # Interleaved-thinking / always-reasoning. Use MiniMax's | |
| # official sampling: temp=1.0, top_p=0.95, top_k=40 | |
| out = generate(model, tokenizer, prompt=prompt, max_tokens=4096, | |
| sampler=make_sampler(temp=1.0, top_p=0.95, top_k=40)) | |
| ``` | |
| ## Evaluation | |
| ### HumanEval+ (code generation) | |
| - **Dataset**: `evalplus/humanevalplus` test split (164 prompts, harder tests than HumanEval). | |
| - **Protocol**: sampled pass@1 baseline + pass@5 retry on failures. | |
| - **Sampling for both pass@1 and pass@5 retry**: temp=1.0, top_p=0.95, top_k=40 (MiniMax official); max_tokens=5000 on pass@1, 1200 on pass@5; k=5 samples per failed problem, early stop on first pass. | |
| - **Grading**: each candidate run with 20s subprocess timeout; must pass ALL EvalPlus tests. | |
| - **Extractor**: `jang_tools.kimi_prune.bench_humaneval._extract_code` (≥ 2026-04-24). The earlier extractor mis-paired markdown fences when the model emitted token-boundary glitches at the language tag (e.g. `\`\`\`python一致:`, `\`\`\`pythonfr`) and when the chat template prefilled `<think>` at the prompt boundary, costing roughly nine points of pass@1. | |
| | Metric | Score | | |
| |--------|-------| | |
| | **pass@1 (sampled, temp=1.0)** | **81.10%** (133/164) | | |
| | **pass@5 (sampled, retry of failures)** | **90.24%** (148/164) | | |
| After the extractor fix, 30 of 46 originally-counted pass@1 failures resolve cleanly: 15 were correct answers eaten by fence-pairing, and another 15 recover under pass@5 sampling. The 16 residuals split into ~8 token-budget starvations (`no_code_block`), ~5 in-code 2-bit token-boundary glitches (`return False言`, `Nonef`, etc.), and ~3 genuine logic errors on EvalPlus hidden tests. | |
| ## Variants | |
| | Variant | Prune | Size | HF | | |
| |---------|-------|------|-----| | |
| | **MiniMax-M2.7-Small** | 40% | 38 GB | `OsaurusAI/MiniMax-M2.7-Small-JANGTQ` | | |
| | MiniMax-M2.7-Med | 25% | ~48 GB | `OsaurusAI/MiniMax-M2.7-Med-JANGTQ` *(pending)* | | |
| | MiniMax-M2.7-Large | 10% | ~57 GB | `OsaurusAI/MiniMax-M2.7-Large-JANGTQ` *(pending)* | | |
| Also released under `JANGQ-AI/MiniMax-M2.7-*-JANGTQ`. | |
| ## Credits | |
| Base model: [MiniMax M2](https://huggingface.co/MiniMaxAI/MiniMax-M2). | |
| Methodology: [JANG toolchain](https://github.com/jinho-jang/jang) — REAP saliency + JANGTQ codebook quantization. | |
| Served by: [Osaurus](https://osaurus.ai) — Apple-Silicon-native MLX inference. | |
| ## License | |
| Modified MIT — inherited from MiniMax M2. | |