Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,5 @@
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
tags:
|
| 3 |
- moe
|
|
@@ -20,11 +22,7 @@ base_model:
|
|
| 20 |
|
| 21 |
A lean, efficiency-first variant of MiniMax-M2 designed to maximize **latency, throughput, and VRAM savings** for local, on-prem, and edge deployments.
|
| 22 |
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
* **What:** ~55% expert-pruned MoE with staged pruning + knowledge distillation.
|
| 26 |
-
* **Why:** Push the efficiency frontier for compact, responsive deployments.
|
| 27 |
-
* **Now:** Ready for experimentation with solid coverage across core evals and more on the way.
|
| 28 |
|
| 29 |
---
|
| 30 |
|
|
@@ -51,7 +49,6 @@ A lean, efficiency-first variant of MiniMax-M2 designed to maximize **latency, t
|
|
| 51 |
* **Teacher–student setup:** Start with **MiniMax-M2** as teacher and a copy as student.
|
| 52 |
* **Gradual expert pruning:** Remove **≈5% experts per stage** over **~11 stages** (≈**55% total**), guided by importance scores with a lightweight **Leave-One-Expert-Out** check to retain rare-but-important experts.
|
| 53 |
* **Distill after each prune:** Retrain the student to imitate the teacher on
|
| 54 |
-
|
| 55 |
* **Outputs** (token probability distributions),
|
| 56 |
* **Hidden states**, and
|
| 57 |
* **Router behavior** over the **surviving experts**.
|
|
@@ -63,8 +60,11 @@ https://github.com/latent-variable/minimax-agent-guide
|
|
| 63 |
|
| 64 |
# Model Report — THRIFT-55-v1
|
| 65 |
|
| 66 |
-
**Evaluation windows:** Nov 7–9, 2025 & Nov 24, 2025
|
| 67 |
-
**Last updated:** Nov
|
|
|
|
|
|
|
|
|
|
| 68 |
|
| 69 |
## 📊 Results to date
|
| 70 |
|
|
@@ -80,7 +80,7 @@ https://github.com/latent-variable/minimax-agent-guide
|
|
| 80 |
| Social Sci. | 71.66% |
|
| 81 |
| Other | 63.69% |
|
| 82 |
|
| 83 |
-
**Selected Tasks**
|
| 84 |
|
| 85 |
| Task | Score |
|
| 86 |
| :----------------------- | -----: |
|
|
@@ -92,24 +92,29 @@ https://github.com/latent-variable/minimax-agent-guide
|
|
| 92 |
| openbookqa (acc_norm) | 38.20% |
|
| 93 |
| rte | 68.23% |
|
| 94 |
| winogrande | 64.64% |
|
|
|
|
|
|
|
|
|
|
| 95 |
|
| 96 |
### 2) Code Generation (EvalPlus)
|
| 97 |
|
| 98 |
**MBPP (Python, 378 problems)**
|
| 99 |
|
| 100 |
-
| Metric | Score |
|
| 101 |
-
| :------ |
|
| 102 |
-
| MBPP | 42.1% |
|
| 103 |
-
| MBPP+ | 37.3% |
|
| 104 |
-
| Average | 39.7%
|
| 105 |
|
| 106 |
**HumanEval (164 problems)**
|
| 107 |
|
| 108 |
-
| Metric | Score |
|
| 109 |
-
| :--------- |
|
| 110 |
-
| HumanEval | 40.2% |
|
| 111 |
-
| HumanEval+ | 39.6% |
|
| 112 |
-
| Average | 39.9%
|
|
|
|
|
|
|
| 113 |
|
| 114 |
### 3) LiveCodeBench (Live Coding)
|
| 115 |
|
|
@@ -118,18 +123,48 @@ https://github.com/latent-variable/minimax-agent-guide
|
|
| 118 |
| pass@1 | 16.48% |
|
| 119 |
| Problems | 182 |
|
| 120 |
|
| 121 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 122 |
|
| 123 |
-
* **
|
| 124 |
-
* **
|
|
|
|
|
|
|
|
|
|
| 125 |
|
| 126 |
---
|
| 127 |
|
| 128 |
## SGLang Deployment (Python)
|
| 129 |
|
| 130 |
-
|
| 131 |
|
| 132 |
-
```
|
| 133 |
git clone -b v0.5.4.post1 https://github.com/sgl-project/sglang.git
|
| 134 |
cd sglang
|
| 135 |
|
|
@@ -139,7 +174,7 @@ pip install -e "python"
|
|
| 139 |
|
| 140 |
**4-GPU launch**
|
| 141 |
|
| 142 |
-
```
|
| 143 |
python -m sglang.launch_server \
|
| 144 |
--model-path VibeStudio/MiniMax-M2-THRIFT-55-v1 \
|
| 145 |
--tp-size 4 \
|
|
@@ -153,7 +188,7 @@ python -m sglang.launch_server \
|
|
| 153 |
|
| 154 |
**8-GPU launch**
|
| 155 |
|
| 156 |
-
```
|
| 157 |
python -m sglang.launch_server \
|
| 158 |
--model-path VibeStudio/MiniMax-M2-THRIFT-55-v1 \
|
| 159 |
--tp-size 8 \
|
|
@@ -168,7 +203,7 @@ python -m sglang.launch_server \
|
|
| 168 |
|
| 169 |
### Quick Test (OpenAI-compatible)
|
| 170 |
|
| 171 |
-
```
|
| 172 |
curl http://localhost:8000/v1/chat/completions \
|
| 173 |
-H "Content-Type: application/json" \
|
| 174 |
-d '{
|
|
@@ -184,18 +219,22 @@ curl http://localhost:8000/v1/chat/completions \
|
|
| 184 |
|
| 185 |
## Benchmarks
|
| 186 |
|
| 187 |
-
This README reflects
|
| 188 |
|
| 189 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 190 |
|
| 191 |
-
|
| 192 |
|
| 193 |
---
|
| 194 |
|
| 195 |
## License
|
| 196 |
|
| 197 |
-
Derived from MiniMax-M2 and distributed under the **MIT License**
|
| 198 |
-
[http://github.com/MiniMax-AI/MiniMax-M2/blob/main/LICENSE](http://github.com/MiniMax-AI/MiniMax-M2/blob/main/LICENSE)
|
| 199 |
|
| 200 |
---
|
| 201 |
|
|
|
|
| 1 |
+
|
| 2 |
+
```
|
| 3 |
---
|
| 4 |
tags:
|
| 5 |
- moe
|
|
|
|
| 22 |
|
| 23 |
A lean, efficiency-first variant of MiniMax-M2 designed to maximize **latency, throughput, and VRAM savings** for local, on-prem, and edge deployments.
|
| 24 |
|
| 25 |
+
> **Note:** `MiniMax-M2-THRIFT-55` and `MiniMax-M2-THRIFT-55-v1` refer to the same model variant.
|
|
|
|
|
|
|
|
|
|
|
|
|
| 26 |
|
| 27 |
---
|
| 28 |
|
|
|
|
| 49 |
* **Teacher–student setup:** Start with **MiniMax-M2** as teacher and a copy as student.
|
| 50 |
* **Gradual expert pruning:** Remove **≈5% experts per stage** over **~11 stages** (≈**55% total**), guided by importance scores with a lightweight **Leave-One-Expert-Out** check to retain rare-but-important experts.
|
| 51 |
* **Distill after each prune:** Retrain the student to imitate the teacher on
|
|
|
|
| 52 |
* **Outputs** (token probability distributions),
|
| 53 |
* **Hidden states**, and
|
| 54 |
* **Router behavior** over the **surviving experts**.
|
|
|
|
| 60 |
|
| 61 |
# Model Report — THRIFT-55-v1
|
| 62 |
|
| 63 |
+
**Evaluation windows:** Nov 7–9, 2025 & Nov 24–25, 2025
|
| 64 |
+
**Last updated:** Nov 26, 2025
|
| 65 |
+
**Eval status:** 6/8 benchmarks complete (**75%**) – WildBench & SWE-Bench pending.
|
| 66 |
+
|
| 67 |
+
---
|
| 68 |
|
| 69 |
## 📊 Results to date
|
| 70 |
|
|
|
|
| 80 |
| Social Sci. | 71.66% |
|
| 81 |
| Other | 63.69% |
|
| 82 |
|
| 83 |
+
**Selected Tasks (lm-eval)**
|
| 84 |
|
| 85 |
| Task | Score |
|
| 86 |
| :----------------------- | -----: |
|
|
|
|
| 92 |
| openbookqa (acc_norm) | 38.20% |
|
| 93 |
| rte | 68.23% |
|
| 94 |
| winogrande | 64.64% |
|
| 95 |
+
| **Average (8 tasks)** | **62.05%** |
|
| 96 |
+
|
| 97 |
+
---
|
| 98 |
|
| 99 |
### 2) Code Generation (EvalPlus)
|
| 100 |
|
| 101 |
**MBPP (Python, 378 problems)**
|
| 102 |
|
| 103 |
+
| Metric | Score | Problems Solved |
|
| 104 |
+
| :------ | -----: | --------------: |
|
| 105 |
+
| MBPP | 42.1% | 159 / 378 |
|
| 106 |
+
| MBPP+ | 37.3% | 141 / 378 |
|
| 107 |
+
| Average | 39.7% | – |
|
| 108 |
|
| 109 |
**HumanEval (164 problems)**
|
| 110 |
|
| 111 |
+
| Metric | Score | Problems Solved |
|
| 112 |
+
| :--------- | -----: | --------------: |
|
| 113 |
+
| HumanEval | 40.2% | 66 / 164 |
|
| 114 |
+
| HumanEval+ | 39.6% | 65 / 164 |
|
| 115 |
+
| Average | 39.9% | – |
|
| 116 |
+
|
| 117 |
+
---
|
| 118 |
|
| 119 |
### 3) LiveCodeBench (Live Coding)
|
| 120 |
|
|
|
|
| 123 |
| pass@1 | 16.48% |
|
| 124 |
| Problems | 182 |
|
| 125 |
|
| 126 |
+
Configuration: temperature **0.2** (greedy-ish decoding).
|
| 127 |
+
|
| 128 |
+
---
|
| 129 |
+
|
| 130 |
+
### 4) Math Reasoning
|
| 131 |
+
|
| 132 |
+
**GSM8K (Grade School Math, 1,319 problems)**
|
| 133 |
+
|
| 134 |
+
| Metric | Score | Problems Solved |
|
| 135 |
+
|--------|--------:|----------------:|
|
| 136 |
+
| GSM8K | 84.91% | 1,120 / 1,319 |
|
| 137 |
+
|
| 138 |
+
**MATH-500 (Competition Math)**
|
| 139 |
+
|
| 140 |
+
| Metric | Score |
|
| 141 |
+
|----------|--------:|
|
| 142 |
+
| Overall | 90.8% |
|
| 143 |
+
| Level 1 | 97.67% |
|
| 144 |
+
| Level 2 | 95.56% |
|
| 145 |
+
| Level 3 | 89.52% |
|
| 146 |
+
| Level 4 | 90.62% |
|
| 147 |
+
| Level 5 | 86.57% |
|
| 148 |
+
|
| 149 |
+
---
|
| 150 |
+
|
| 151 |
+
### 5) Coming next
|
| 152 |
+
|
| 153 |
+
Remaining benchmarks still on the queue:
|
| 154 |
|
| 155 |
+
* **WildBench** (open-world / wild-task robustness)
|
| 156 |
+
* **SWE-Bench** (software engineering & repo-level tasks)
|
| 157 |
+
|
| 158 |
+
---
|
| 159 |
+
```
|
| 160 |
|
| 161 |
---
|
| 162 |
|
| 163 |
## SGLang Deployment (Python)
|
| 164 |
|
| 165 |
+
Use a fresh virtual environment (e.g., `venv`, `conda`, or `uv`).
|
| 166 |
|
| 167 |
+
```shell
|
| 168 |
git clone -b v0.5.4.post1 https://github.com/sgl-project/sglang.git
|
| 169 |
cd sglang
|
| 170 |
|
|
|
|
| 174 |
|
| 175 |
**4-GPU launch**
|
| 176 |
|
| 177 |
+
```shell
|
| 178 |
python -m sglang.launch_server \
|
| 179 |
--model-path VibeStudio/MiniMax-M2-THRIFT-55-v1 \
|
| 180 |
--tp-size 4 \
|
|
|
|
| 188 |
|
| 189 |
**8-GPU launch**
|
| 190 |
|
| 191 |
+
```shell
|
| 192 |
python -m sglang.launch_server \
|
| 193 |
--model-path VibeStudio/MiniMax-M2-THRIFT-55-v1 \
|
| 194 |
--tp-size 8 \
|
|
|
|
| 203 |
|
| 204 |
### Quick Test (OpenAI-compatible)
|
| 205 |
|
| 206 |
+
```shell
|
| 207 |
curl http://localhost:8000/v1/chat/completions \
|
| 208 |
-H "Content-Type: application/json" \
|
| 209 |
-d '{
|
|
|
|
| 219 |
|
| 220 |
## Benchmarks
|
| 221 |
|
| 222 |
+
This README currently reflects results for:
|
| 223 |
|
| 224 |
+
* **MMLU** (+ 8 lm-eval tasks, 62.05% avg)
|
| 225 |
+
* **MBPP** & **MBPP+** (EvalPlus)
|
| 226 |
+
* **HumanEval** & **HumanEval+** (EvalPlus)
|
| 227 |
+
* **LiveCodeBench**
|
| 228 |
+
* **GSM8K**
|
| 229 |
+
* **MATH-500**
|
| 230 |
|
| 231 |
+
Evaluation status: **75% complete (6/8 benchmarks)** — **WildBench** and **SWE-Bench** will be added here once finalized.
|
| 232 |
|
| 233 |
---
|
| 234 |
|
| 235 |
## License
|
| 236 |
|
| 237 |
+
Derived from MiniMax-M2 and distributed under the **MIT License** [http://github.com/MiniMax-AI/MiniMax-M2/blob/main/LICENSE](http://github.com/MiniMax-AI/MiniMax-M2/blob/main/LICENSE)
|
|
|
|
| 238 |
|
| 239 |
---
|
| 240 |
|