Text Generation
MLX
Safetensors
minimax_m2
osaurus
jangtq
jangtq-prestack
jangtq-k
mixed-precision
minimax
minimax-m2
Mixture of Experts
apple-silicon
conversational
reasoning
chain-of-thought
quantization
230b
custom_code
Instructions to use OsaurusAI/MiniMax-M2.7-JANGTQ_K with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use OsaurusAI/MiniMax-M2.7-JANGTQ_K with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("OsaurusAI/MiniMax-M2.7-JANGTQ_K") prompt = "Write a story about Einstein" messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template( messages, add_generation_prompt=True ) text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
- Pi new
How to use OsaurusAI/MiniMax-M2.7-JANGTQ_K with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "OsaurusAI/MiniMax-M2.7-JANGTQ_K"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "OsaurusAI/MiniMax-M2.7-JANGTQ_K" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use OsaurusAI/MiniMax-M2.7-JANGTQ_K with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "OsaurusAI/MiniMax-M2.7-JANGTQ_K"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default OsaurusAI/MiniMax-M2.7-JANGTQ_K
Run Hermes
hermes
- MLX LM
How to use OsaurusAI/MiniMax-M2.7-JANGTQ_K with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Interactive chat REPL mlx_lm.chat --model "OsaurusAI/MiniMax-M2.7-JANGTQ_K"
Run an OpenAI-compatible server
# Install MLX LM uv tool install mlx-lm # Start the server mlx_lm.server --model "OsaurusAI/MiniMax-M2.7-JANGTQ_K" # Calling the OpenAI-compatible server with curl curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "OsaurusAI/MiniMax-M2.7-JANGTQ_K", "messages": [ {"role": "user", "content": "Hello"} ] }'
Add MMLU-200 (93.5%), speed/memory benchmarks, fix variants table sizes (47→56 GB), expand topic tags
#1
by dealignai - opened
README.md
CHANGED
|
@@ -14,6 +14,12 @@ tags:
|
|
| 14 |
- minimax-m2
|
| 15 |
- moe
|
| 16 |
- apple-silicon
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 17 |
pipeline_tag: text-generation
|
| 18 |
base_model: MiniMaxAI/MiniMax-M2.7
|
| 19 |
base_model_relation: quantized
|
|
@@ -39,6 +45,18 @@ JANGTQ_K** quantization in JANGTQ-PRESTACK layout.
|
|
| 39 |
- **Bundle size:** **~74 GB on-disk** (~3-bit avg routed)
|
| 40 |
- **Runs on:** M3 Max 96 GB+ / M4 Max 128 GB / M5 Max 128 GB / Mac Studio
|
| 41 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 42 |
## Why mixed-bit?
|
| 43 |
|
| 44 |
`down_proj`'s output enters the residual stream and accumulates across
|
|
@@ -49,10 +67,10 @@ quality close to full-4-bit (~115 GB) at **64% the size**.
|
|
| 49 |
|
| 50 |
## Variants in the MiniMax-M2.7 line
|
| 51 |
|
| 52 |
-
| Variant | Routed bits (avg) |
|
| 53 |
-
|---|---|---|---|
|
| 54 |
-
| `MiniMax-M2.7-JANGTQ` | 2-bit |
|
| 55 |
-
| **`MiniMax-M2.7-JANGTQ_K` (this)** | **~3-bit (mixed 2/4)** | **74 GB** | **
|
| 56 |
|
| 57 |
## Loading
|
| 58 |
|
|
|
|
| 14 |
- minimax-m2
|
| 15 |
- moe
|
| 16 |
- apple-silicon
|
| 17 |
+
- text-generation
|
| 18 |
+
- conversational
|
| 19 |
+
- reasoning
|
| 20 |
+
- chain-of-thought
|
| 21 |
+
- quantization
|
| 22 |
+
- 230b
|
| 23 |
pipeline_tag: text-generation
|
| 24 |
base_model: MiniMaxAI/MiniMax-M2.7
|
| 25 |
base_model_relation: quantized
|
|
|
|
| 45 |
- **Bundle size:** **~74 GB on-disk** (~3-bit avg routed)
|
| 46 |
- **Runs on:** M3 Max 96 GB+ / M4 Max 128 GB / M5 Max 128 GB / Mac Studio
|
| 47 |
|
| 48 |
+
## Benchmarks
|
| 49 |
+
|
| 50 |
+
| Metric | Value | Setup |
|
| 51 |
+
|---|---|---|
|
| 52 |
+
| **MMLU-200** | **93.5%** (187/200) | thinking ON, `q_per_subject=20`, 10 subjects |
|
| 53 |
+
| Median speed | ~37 tok/s | M4 Max 128 GB, MLX 0.31 |
|
| 54 |
+
| GPU memory at load | ~75 GB | warm |
|
| 55 |
+
|
| 56 |
+
MMLU eval used the standard `mmlu_jangtq_resume.py` runner with the model's
|
| 57 |
+
default chat template (`enable_thinking` undefined → thinking ON, which the
|
| 58 |
+
M2.7 template auto-opens with `<think>\n` after the assistant prefix).
|
| 59 |
+
|
| 60 |
## Why mixed-bit?
|
| 61 |
|
| 62 |
`down_proj`'s output enters the residual stream and accumulates across
|
|
|
|
| 67 |
|
| 68 |
## Variants in the MiniMax-M2.7 line
|
| 69 |
|
| 70 |
+
| Variant | Routed bits (avg) | Size | MMLU-200 | Use case |
|
| 71 |
+
|---|---|---|---|---|
|
| 72 |
+
| [`MiniMax-M2.7-JANGTQ`](https://huggingface.co/OsaurusAI/MiniMax-M2.7-JANGTQ) | 2-bit | 56 GB | 91.5% | smallest, best for tight RAM |
|
| 73 |
+
| **`MiniMax-M2.7-JANGTQ_K` (this)** | **~3-bit (mixed 2/4)** | **74 GB** | **93.5%** | **+2.0pp MMLU vs JANGTQ for +18 GB** |
|
| 74 |
|
| 75 |
## Loading
|
| 76 |
|