Text Generation
MLX
Safetensors
English
Chinese
Mixture of Experts
mixture-of-experts
minimax_m2
quantized
apple-silicon
turboquant
jangtq
jangtq2
reap
Instructions to use OsaurusAI/MiniMax-M2.7-Small-JANGTQ with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use OsaurusAI/MiniMax-M2.7-Small-JANGTQ with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # if on a CUDA device, also pip install mlx[cuda] # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("OsaurusAI/MiniMax-M2.7-Small-JANGTQ") prompt = "Once upon a time in" text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
- MLX LM
How to use OsaurusAI/MiniMax-M2.7-Small-JANGTQ with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Generate some text mlx_lm.generate --model "OsaurusAI/MiniMax-M2.7-Small-JANGTQ" --prompt "Once upon a time"
Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -25,7 +25,7 @@ tags:
|
|
| 25 |
</p>
|
| 26 |
|
| 27 |
<h3 align="center">MiniMax M2.7 Small — 138B-A10B — JANGTQ (MLX)</h3>
|
| 28 |
-
<p align="center"><b>This is now a ~138B-A10B MoE</b> (down from MiniMax M2's 230B base) — 40% routed-expert prune + 2-bit JANGTQ quantization. Distilled from MiniMax M2 via REAP saliency + JANGTQ2 codebook quantization — routed experts at 2-bit via Lloyd-Max codebooks + Hadamard rotation, attention / embed / lm_head / dense MLP at 8-bit affine, norms and router at 16-bit.</p>
|
| 29 |
|
| 30 |
<p align="center">
|
| 31 |
<a href="https://osaurus.ai"><img src="https://img.shields.io/badge/Web-osaurus.ai-blue" alt="Website"></a>
|
|
|
|
| 25 |
</p>
|
| 26 |
|
| 27 |
<h3 align="center">MiniMax M2.7 Small — 138B-A10B — JANGTQ (MLX)</h3>
|
| 28 |
+
<p align="center"><b>This is now a ~138B-A10B MoE — 38 GB on disk</b> (down from MiniMax M2's ~460 GB / 230B base) — 40% routed-expert prune + 2-bit JANGTQ quantization. Distilled from MiniMax M2 via REAP saliency + JANGTQ2 codebook quantization — routed experts at 2-bit via Lloyd-Max codebooks + Hadamard rotation, attention / embed / lm_head / dense MLP at 8-bit affine, norms and router at 16-bit.</p>
|
| 29 |
|
| 30 |
<p align="center">
|
| 31 |
<a href="https://osaurus.ai"><img src="https://img.shields.io/badge/Web-osaurus.ai-blue" alt="Website"></a>
|