vibestudio-HQ commited on
Commit
4f17331
·
verified ·
1 Parent(s): 3e23ef2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +71 -32
README.md CHANGED
@@ -1,3 +1,5 @@
 
 
1
  ---
2
  tags:
3
  - moe
@@ -20,11 +22,7 @@ base_model:
20
 
21
  A lean, efficiency-first variant of MiniMax-M2 designed to maximize **latency, throughput, and VRAM savings** for local, on-prem, and edge deployments.
22
 
23
- ## TLDR
24
-
25
- * **What:** ~55% expert-pruned MoE with staged pruning + knowledge distillation.
26
- * **Why:** Push the efficiency frontier for compact, responsive deployments.
27
- * **Now:** Ready for experimentation with solid coverage across core evals and more on the way.
28
 
29
  ---
30
 
@@ -51,7 +49,6 @@ A lean, efficiency-first variant of MiniMax-M2 designed to maximize **latency, t
51
  * **Teacher–student setup:** Start with **MiniMax-M2** as teacher and a copy as student.
52
  * **Gradual expert pruning:** Remove **≈5% experts per stage** over **~11 stages** (≈**55% total**), guided by importance scores with a lightweight **Leave-One-Expert-Out** check to retain rare-but-important experts.
53
  * **Distill after each prune:** Retrain the student to imitate the teacher on
54
-
55
  * **Outputs** (token probability distributions),
56
  * **Hidden states**, and
57
  * **Router behavior** over the **surviving experts**.
@@ -63,8 +60,11 @@ https://github.com/latent-variable/minimax-agent-guide
63
 
64
  # Model Report — THRIFT-55-v1
65
 
66
- **Evaluation windows:** Nov 7–9, 2025 & Nov 24, 2025
67
- **Last updated:** Nov 25, 2025
 
 
 
68
 
69
  ## 📊 Results to date
70
 
@@ -80,7 +80,7 @@ https://github.com/latent-variable/minimax-agent-guide
80
  | Social Sci. | 71.66% |
81
  | Other | 63.69% |
82
 
83
- **Selected Tasks**
84
 
85
  | Task | Score |
86
  | :----------------------- | -----: |
@@ -92,24 +92,29 @@ https://github.com/latent-variable/minimax-agent-guide
92
  | openbookqa (acc_norm) | 38.20% |
93
  | rte | 68.23% |
94
  | winogrande | 64.64% |
 
 
 
95
 
96
  ### 2) Code Generation (EvalPlus)
97
 
98
  **MBPP (Python, 378 problems)**
99
 
100
- | Metric | Score |
101
- | :------ | ----: |
102
- | MBPP | 42.1% |
103
- | MBPP+ | 37.3% |
104
- | Average | 39.7% |
105
 
106
  **HumanEval (164 problems)**
107
 
108
- | Metric | Score |
109
- | :--------- | ----: |
110
- | HumanEval | 40.2% |
111
- | HumanEval+ | 39.6% |
112
- | Average | 39.9% |
 
 
113
 
114
  ### 3) LiveCodeBench (Live Coding)
115
 
@@ -118,18 +123,48 @@ https://github.com/latent-variable/minimax-agent-guide
118
  | pass@1 | 16.48% |
119
  | Problems | 182 |
120
 
121
- ### 4) Coming next
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
122
 
123
- * **GSM8K** and **MATH-500** (math suite)
124
- * **WildBench** and **SWE-Bench** (knowledge & software tasks)
 
 
 
125
 
126
  ---
127
 
128
  ## SGLang Deployment (Python)
129
 
130
- > Use a fresh virtual environment (e.g., `venv`, `conda`, or `uv`).
131
 
132
- ```bash
133
  git clone -b v0.5.4.post1 https://github.com/sgl-project/sglang.git
134
  cd sglang
135
 
@@ -139,7 +174,7 @@ pip install -e "python"
139
 
140
  **4-GPU launch**
141
 
142
- ```bash
143
  python -m sglang.launch_server \
144
  --model-path VibeStudio/MiniMax-M2-THRIFT-55-v1 \
145
  --tp-size 4 \
@@ -153,7 +188,7 @@ python -m sglang.launch_server \
153
 
154
  **8-GPU launch**
155
 
156
- ```bash
157
  python -m sglang.launch_server \
158
  --model-path VibeStudio/MiniMax-M2-THRIFT-55-v1 \
159
  --tp-size 8 \
@@ -168,7 +203,7 @@ python -m sglang.launch_server \
168
 
169
  ### Quick Test (OpenAI-compatible)
170
 
171
- ```bash
172
  curl http://localhost:8000/v1/chat/completions \
173
  -H "Content-Type: application/json" \
174
  -d '{
@@ -184,18 +219,22 @@ curl http://localhost:8000/v1/chat/completions \
184
 
185
  ## Benchmarks
186
 
187
- This README reflects **MMLU**, **MBPP**, **HumanEval**, and **LiveCodeBench** results completed by **Nov 25, 2025**. Additional benchmarks will appear here as they finish.
188
 
189
- ## Research paper
 
 
 
 
 
190
 
191
- Coming soon.
192
 
193
  ---
194
 
195
  ## License
196
 
197
- Derived from MiniMax-M2 and distributed under the **MIT License**
198
- [http://github.com/MiniMax-AI/MiniMax-M2/blob/main/LICENSE](http://github.com/MiniMax-AI/MiniMax-M2/blob/main/LICENSE)
199
 
200
  ---
201
 
 
1
+
2
+ ```
3
  ---
4
  tags:
5
  - moe
 
22
 
23
  A lean, efficiency-first variant of MiniMax-M2 designed to maximize **latency, throughput, and VRAM savings** for local, on-prem, and edge deployments.
24
 
25
+ > **Note:** `MiniMax-M2-THRIFT-55` and `MiniMax-M2-THRIFT-55-v1` refer to the same model variant.
 
 
 
 
26
 
27
  ---
28
 
 
49
  * **Teacher–student setup:** Start with **MiniMax-M2** as teacher and a copy as student.
50
  * **Gradual expert pruning:** Remove **≈5% experts per stage** over **~11 stages** (≈**55% total**), guided by importance scores with a lightweight **Leave-One-Expert-Out** check to retain rare-but-important experts.
51
  * **Distill after each prune:** Retrain the student to imitate the teacher on
 
52
  * **Outputs** (token probability distributions),
53
  * **Hidden states**, and
54
  * **Router behavior** over the **surviving experts**.
 
60
 
61
  # Model Report — THRIFT-55-v1
62
 
63
+ **Evaluation windows:** Nov 7–9, 2025 & Nov 24–25, 2025
64
+ **Last updated:** Nov 26, 2025
65
+ **Eval status:** 6/8 benchmarks complete (**75%**) – WildBench & SWE-Bench pending.
66
+
67
+ ---
68
 
69
  ## 📊 Results to date
70
 
 
80
  | Social Sci. | 71.66% |
81
  | Other | 63.69% |
82
 
83
+ **Selected Tasks (lm-eval)**
84
 
85
  | Task | Score |
86
  | :----------------------- | -----: |
 
92
  | openbookqa (acc_norm) | 38.20% |
93
  | rte | 68.23% |
94
  | winogrande | 64.64% |
95
+ | **Average (8 tasks)** | **62.05%** |
96
+
97
+ ---
98
 
99
  ### 2) Code Generation (EvalPlus)
100
 
101
  **MBPP (Python, 378 problems)**
102
 
103
+ | Metric | Score | Problems Solved |
104
+ | :------ | -----: | --------------: |
105
+ | MBPP | 42.1% | 159 / 378 |
106
+ | MBPP+ | 37.3% | 141 / 378 |
107
+ | Average | 39.7% | – |
108
 
109
  **HumanEval (164 problems)**
110
 
111
+ | Metric | Score | Problems Solved |
112
+ | :--------- | -----: | --------------: |
113
+ | HumanEval | 40.2% | 66 / 164 |
114
+ | HumanEval+ | 39.6% | 65 / 164 |
115
+ | Average | 39.9% | – |
116
+
117
+ ---
118
 
119
  ### 3) LiveCodeBench (Live Coding)
120
 
 
123
  | pass@1 | 16.48% |
124
  | Problems | 182 |
125
 
126
+ Configuration: temperature **0.2** (greedy-ish decoding).
127
+
128
+ ---
129
+
130
+ ### 4) Math Reasoning
131
+
132
+ **GSM8K (Grade School Math, 1,319 problems)**
133
+
134
+ | Metric | Score | Problems Solved |
135
+ |--------|--------:|----------------:|
136
+ | GSM8K | 84.91% | 1,120 / 1,319 |
137
+
138
+ **MATH-500 (Competition Math)**
139
+
140
+ | Metric | Score |
141
+ |----------|--------:|
142
+ | Overall | 90.8% |
143
+ | Level 1 | 97.67% |
144
+ | Level 2 | 95.56% |
145
+ | Level 3 | 89.52% |
146
+ | Level 4 | 90.62% |
147
+ | Level 5 | 86.57% |
148
+
149
+ ---
150
+
151
+ ### 5) Coming next
152
+
153
+ Remaining benchmarks still on the queue:
154
 
155
+ * **WildBench** (open-world / wild-task robustness)
156
+ * **SWE-Bench** (software engineering & repo-level tasks)
157
+
158
+ ---
159
+ ```
160
 
161
  ---
162
 
163
  ## SGLang Deployment (Python)
164
 
165
+ Use a fresh virtual environment (e.g., `venv`, `conda`, or `uv`).
166
 
167
+ ```shell
168
  git clone -b v0.5.4.post1 https://github.com/sgl-project/sglang.git
169
  cd sglang
170
 
 
174
 
175
  **4-GPU launch**
176
 
177
+ ```shell
178
  python -m sglang.launch_server \
179
  --model-path VibeStudio/MiniMax-M2-THRIFT-55-v1 \
180
  --tp-size 4 \
 
188
 
189
  **8-GPU launch**
190
 
191
+ ```shell
192
  python -m sglang.launch_server \
193
  --model-path VibeStudio/MiniMax-M2-THRIFT-55-v1 \
194
  --tp-size 8 \
 
203
 
204
  ### Quick Test (OpenAI-compatible)
205
 
206
+ ```shell
207
  curl http://localhost:8000/v1/chat/completions \
208
  -H "Content-Type: application/json" \
209
  -d '{
 
219
 
220
  ## Benchmarks
221
 
222
+ This README currently reflects results for:
223
 
224
+ * **MMLU** (+ 8 lm-eval tasks, 62.05% avg)
225
+ * **MBPP** & **MBPP+** (EvalPlus)
226
+ * **HumanEval** & **HumanEval+** (EvalPlus)
227
+ * **LiveCodeBench**
228
+ * **GSM8K**
229
+ * **MATH-500**
230
 
231
+ Evaluation status: **75% complete (6/8 benchmarks)** — **WildBench** and **SWE-Bench** will be added here once finalized.
232
 
233
  ---
234
 
235
  ## License
236
 
237
+ Derived from MiniMax-M2 and distributed under the **MIT License** [http://github.com/MiniMax-AI/MiniMax-M2/blob/main/LICENSE](http://github.com/MiniMax-AI/MiniMax-M2/blob/main/LICENSE)
 
238
 
239
  ---
240