VibeStudio
/

MiniMax-M2-THRIFT-55

@@ -1,3 +1,5 @@
 ---
 tags:
 - moe
@@ -20,11 +22,7 @@ base_model:
 A lean, efficiency-first variant of MiniMax-M2 designed to maximize **latency, throughput, and VRAM savings** for local, on-prem, and edge deployments.
-## TLDR
-* **What:** ~55% expert-pruned MoE with staged pruning + knowledge distillation.
-* **Why:** Push the efficiency frontier for compact, responsive deployments.
-* **Now:** Ready for experimentation with solid coverage across core evals and more on the way.
 ---
@@ -51,7 +49,6 @@ A lean, efficiency-first variant of MiniMax-M2 designed to maximize **latency, t
 * **Teacher–student setup:** Start with **MiniMax-M2** as teacher and a copy as student.
 * **Gradual expert pruning:** Remove **≈5% experts per stage** over **~11 stages** (≈**55% total**), guided by importance scores with a lightweight **Leave-One-Expert-Out** check to retain rare-but-important experts.
 * **Distill after each prune:** Retrain the student to imitate the teacher on
   * **Outputs** (token probability distributions),
   * **Hidden states**, and
   * **Router behavior** over the **surviving experts**.
@@ -63,8 +60,11 @@ https://github.com/latent-variable/minimax-agent-guide
 # Model Report — THRIFT-55-v1
-**Evaluation windows:** Nov 7–9, 2025 & Nov 24, 2025
-**Last updated:** Nov 25, 2025
 ## 📊 Results to date
@@ -80,7 +80,7 @@ https://github.com/latent-variable/minimax-agent-guide
 | Social Sci.  | 71.66% |
 | Other        | 63.69% |
-**Selected Tasks**
 | Task                     |  Score |
 | :----------------------- | -----: |
@@ -92,24 +92,29 @@ https://github.com/latent-variable/minimax-agent-guide
 | openbookqa (acc_norm)    | 38.20% |
 | rte                      | 68.23% |
 | winogrande               | 64.64% |
 ### 2) Code Generation (EvalPlus)
 **MBPP (Python, 378 problems)**
-| Metric  | Score |
-| :------ | ----: |
-| MBPP    | 42.1% |
-| MBPP+   | 37.3% |
-| Average | 39.7% |
 **HumanEval (164 problems)**
-| Metric     | Score |
-| :--------- | ----: |
-| HumanEval  | 40.2% |
-| HumanEval+ | 39.6% |
-| Average    | 39.9% |
 ### 3) LiveCodeBench (Live Coding)
@@ -118,18 +123,48 @@ https://github.com/latent-variable/minimax-agent-guide
 | pass@1   |     16.48% |
 | Problems |        182 |
-### 4) Coming next
-* **GSM8K** and **MATH-500** (math suite)
-* **WildBench** and **SWE-Bench** (knowledge & software tasks)
 ---
 ## SGLang Deployment (Python)
-> Use a fresh virtual environment (e.g., `venv`, `conda`, or `uv`).
-```bash
 git clone -b v0.5.4.post1 https://github.com/sgl-project/sglang.git
 cd sglang
@@ -139,7 +174,7 @@ pip install -e "python"
 **4-GPU launch**
-```bash
 python -m sglang.launch_server \
   --model-path VibeStudio/MiniMax-M2-THRIFT-55-v1 \
   --tp-size 4 \
@@ -153,7 +188,7 @@ python -m sglang.launch_server \
 **8-GPU launch**
-```bash
 python -m sglang.launch_server \
   --model-path VibeStudio/MiniMax-M2-THRIFT-55-v1 \
   --tp-size 8 \
@@ -168,7 +203,7 @@ python -m sglang.launch_server \
 ### Quick Test (OpenAI-compatible)
-```bash
 curl http://localhost:8000/v1/chat/completions \
   -H "Content-Type: application/json" \
   -d '{
@@ -184,18 +219,22 @@ curl http://localhost:8000/v1/chat/completions \
 ## Benchmarks
-This README reflects **MMLU**, **MBPP**, **HumanEval**, and **LiveCodeBench** results completed by **Nov 25, 2025**. Additional benchmarks will appear here as they finish.
-## Research paper
-Coming soon.
 ---
 ## License
-Derived from MiniMax-M2 and distributed under the **MIT License**
-[http://github.com/MiniMax-AI/MiniMax-M2/blob/main/LICENSE](http://github.com/MiniMax-AI/MiniMax-M2/blob/main/LICENSE)
 ---

+```
 ---
 tags:
 - moe
 A lean, efficiency-first variant of MiniMax-M2 designed to maximize **latency, throughput, and VRAM savings** for local, on-prem, and edge deployments.
+> **Note:** `MiniMax-M2-THRIFT-55` and `MiniMax-M2-THRIFT-55-v1` refer to the same model variant.
 ---
 * **Teacher–student setup:** Start with **MiniMax-M2** as teacher and a copy as student.
 * **Gradual expert pruning:** Remove **≈5% experts per stage** over **~11 stages** (≈**55% total**), guided by importance scores with a lightweight **Leave-One-Expert-Out** check to retain rare-but-important experts.
 * **Distill after each prune:** Retrain the student to imitate the teacher on
   * **Outputs** (token probability distributions),
   * **Hidden states**, and
   * **Router behavior** over the **surviving experts**.
 # Model Report — THRIFT-55-v1
+**Evaluation windows:** Nov 7–9, 2025 & Nov 24–25, 2025
+**Last updated:** Nov 26, 2025
+**Eval status:** 6/8 benchmarks complete (**75%**) – WildBench & SWE-Bench pending.
+---
 ## 📊 Results to date
 | Social Sci.  | 71.66% |
 | Other        | 63.69% |
+**Selected Tasks (lm-eval)**
 | Task                     |  Score |
 | :----------------------- | -----: |
 | openbookqa (acc_norm)    | 38.20% |
 | rte                      | 68.23% |
 | winogrande               | 64.64% |
+| **Average (8 tasks)**    | **62.05%** |
+---
 ### 2) Code Generation (EvalPlus)
 **MBPP (Python, 378 problems)**
+| Metric  | Score  | Problems Solved |
+| :------ | -----: | --------------: |
+| MBPP    | 42.1%  | 159 / 378       |
+| MBPP+   | 37.3%  | 141 / 378       |
+| Average | 39.7%  |        –        |
 **HumanEval (164 problems)**
+| Metric     | Score  | Problems Solved |
+| :--------- | -----: | --------------: |
+| HumanEval  | 40.2%  | 66 / 164        |
+| HumanEval+ | 39.6%  | 65 / 164        |
+| Average    | 39.9%  |        –        |
+---
 ### 3) LiveCodeBench (Live Coding)
 | pass@1   |     16.48% |
 | Problems |        182 |
+Configuration: temperature **0.2** (greedy-ish decoding).
+---
+### 4) Math Reasoning
+**GSM8K (Grade School Math, 1,319 problems)**
+| Metric | Score   | Problems Solved |
+|--------|--------:|----------------:|
+| GSM8K  | 84.91% | 1,120 / 1,319   |
+**MATH-500 (Competition Math)**
+| Metric   | Score   |
+|----------|--------:|
+| Overall  | 90.8%   |
+| Level 1  | 97.67%  |
+| Level 2  | 95.56%  |
+| Level 3  | 89.52%  |
+| Level 4  | 90.62%  |
+| Level 5  | 86.57%  |
+---
+### 5) Coming next
+Remaining benchmarks still on the queue:
+* **WildBench** (open-world / wild-task robustness)
+* **SWE-Bench** (software engineering & repo-level tasks)
+---
+```
 ---
 ## SGLang Deployment (Python)
+Use a fresh virtual environment (e.g., `venv`, `conda`, or `uv`).
+```shell
 git clone -b v0.5.4.post1 https://github.com/sgl-project/sglang.git
 cd sglang
 **4-GPU launch**
+```shell
 python -m sglang.launch_server \
   --model-path VibeStudio/MiniMax-M2-THRIFT-55-v1 \
   --tp-size 4 \
 **8-GPU launch**
+```shell
 python -m sglang.launch_server \
   --model-path VibeStudio/MiniMax-M2-THRIFT-55-v1 \
   --tp-size 8 \
 ### Quick Test (OpenAI-compatible)
+```shell
 curl http://localhost:8000/v1/chat/completions \
   -H "Content-Type: application/json" \
   -d '{
 ## Benchmarks
+This README currently reflects results for:
+* **MMLU** (+ 8 lm-eval tasks, 62.05% avg)
+* **MBPP** & **MBPP+** (EvalPlus)
+* **HumanEval** & **HumanEval+** (EvalPlus)
+* **LiveCodeBench**
+* **GSM8K**
+* **MATH-500**
+Evaluation status: **75% complete (6/8 benchmarks)** — **WildBench** and **SWE-Bench** will be added here once finalized.
 ---
 ## License
+Derived from MiniMax-M2 and distributed under the **MIT License** [http://github.com/MiniMax-AI/MiniMax-M2/blob/main/LICENSE](http://github.com/MiniMax-AI/MiniMax-M2/blob/main/LICENSE)
 ---