Text Generation
MLX
Safetensors
English
Chinese
Mixture of Experts
mixture-of-experts
minimax_m2
quantized
apple-silicon
turboquant
jangtq
jangtq2
reap
Instructions to use OsaurusAI/MiniMax-M2.7-Small-JANGTQ with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use OsaurusAI/MiniMax-M2.7-Small-JANGTQ with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # if on a CUDA device, also pip install mlx[cuda] # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("OsaurusAI/MiniMax-M2.7-Small-JANGTQ") prompt = "Once upon a time in" text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
- MLX LM
How to use OsaurusAI/MiniMax-M2.7-Small-JANGTQ with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Generate some text mlx_lm.generate --model "OsaurusAI/MiniMax-M2.7-Small-JANGTQ" --prompt "Once upon a time"
fix(eval): correct pass@1=81.10/pass@5=90.24 after extractor bug fix
Browse files
README.md
CHANGED
|
@@ -91,13 +91,14 @@ out = generate(model, tokenizer, prompt=prompt, max_tokens=4096,
|
|
| 91 |
- **Protocol**: sampled pass@1 baseline + pass@5 retry on failures.
|
| 92 |
- **Sampling for both pass@1 and pass@5 retry**: temp=1.0, top_p=0.95, top_k=40 (MiniMax official); max_tokens=5000 on pass@1, 1200 on pass@5; k=5 samples per failed problem, early stop on first pass.
|
| 93 |
- **Grading**: each candidate run with 20s subprocess timeout; must pass ALL EvalPlus tests.
|
|
|
|
| 94 |
|
| 95 |
| Metric | Score |
|
| 96 |
|--------|-------|
|
| 97 |
-
| **pass@1 (sampled, temp=1.0)** | **
|
| 98 |
-
| **pass@5 (sampled, retry of failures)** | **
|
| 99 |
|
| 100 |
-
|
| 101 |
|
| 102 |
## Variants
|
| 103 |
|
|
|
|
| 91 |
- **Protocol**: sampled pass@1 baseline + pass@5 retry on failures.
|
| 92 |
- **Sampling for both pass@1 and pass@5 retry**: temp=1.0, top_p=0.95, top_k=40 (MiniMax official); max_tokens=5000 on pass@1, 1200 on pass@5; k=5 samples per failed problem, early stop on first pass.
|
| 93 |
- **Grading**: each candidate run with 20s subprocess timeout; must pass ALL EvalPlus tests.
|
| 94 |
+
- **Extractor**: `jang_tools.kimi_prune.bench_humaneval._extract_code` (≥ 2026-04-24). The earlier extractor mis-paired markdown fences when the model emitted token-boundary glitches at the language tag (e.g. `\`\`\`python一致:`, `\`\`\`pythonfr`) and when the chat template prefilled `<think>` at the prompt boundary, costing roughly nine points of pass@1.
|
| 95 |
|
| 96 |
| Metric | Score |
|
| 97 |
|--------|-------|
|
| 98 |
+
| **pass@1 (sampled, temp=1.0)** | **81.10%** (133/164) |
|
| 99 |
+
| **pass@5 (sampled, retry of failures)** | **90.24%** (148/164) |
|
| 100 |
|
| 101 |
+
After the extractor fix, 30 of 46 originally-counted pass@1 failures resolve cleanly: 15 were correct answers eaten by fence-pairing, and another 15 recover under pass@5 sampling. The 16 residuals split into ~8 token-budget starvations (`no_code_block`), ~5 in-code 2-bit token-boundary glitches (`return False言`, `Nonef`, etc.), and ~3 genuine logic errors on EvalPlus hidden tests.
|
| 102 |
|
| 103 |
## Variants
|
| 104 |
|