vibestudio-HQ commited on
Commit
214570d
·
verified ·
1 Parent(s): 9522a69

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +78 -85
README.md CHANGED
@@ -13,7 +13,6 @@ base_model:
13
  - MiniMaxAI/MiniMax-M2
14
  ---
15
 
16
-
17
  ![Screenshot](https://huggingface.co/VibeStudio/MiniMax-M2-THRIFT/resolve/main/vibe_processed_by_imagy.png)
18
 
19
  # THRIFT — Targeted Reduction for Inference and Fine-Tuning
@@ -22,27 +21,27 @@ A performance-optimized variant of the base model that delivers faster responses
22
 
23
  ## TLDR
24
 
25
- We, over-caffinated researchers at VibeStud.io wanted to create a 50% pruned version of the SOTA MiniMax M2 that is best suited for local/air-gapped coding. This version we achieved \~25%. A 50% pruned version is under development while a not so sucky team of ours is working on a 50% pruned version of Kimi K2 Thinking.We’re writing the paper and expanding the evaluation set to substantiate the results. Check back later, cheers\!
26
 
27
  ## Why it’s useful
28
 
29
- * **Lower latency:** Snappier responses for interactive apps and chatbots.
30
- * **Smaller memory footprint:** Runs on cheaper GPUs or with fewer resources per replica.
31
- * **Higher throughput:** Serve more concurrent users at the same cost.
32
- * **Deployment-friendly:** Drop-in replacement for the base model in most inference stacks.
33
  * **Adaptable:** Supports light fine-tuning to match your domain and style guidelines.
34
 
35
  ## Intended use
36
 
37
- * General chat and coding assistance
38
- * Enterprise assistants with strict latency/VRAM budgets
39
- * Batch or realtime serving in cloud and on-prem environments
40
  * Edge or cost-sensitive deployments where efficiency matters
41
 
42
  ## When to use it
43
 
44
- * You’re constrained by GPU memory or need shorter response times
45
- * You want to increase QPS without scaling infrastructure
46
  * You need a model that is “good enough” for most tasks at a better cost profile
47
 
48
  ---
@@ -51,115 +50,108 @@ We, over-caffinated researchers at VibeStud.io wanted to create a 50% pruned ver
51
 
52
  **Models Under Evaluation**
53
 
54
- | Model | Type |
55
- | :---- | :---- |
56
- | ModelCloud/MiniMax-M2-BF16 | Base Model |
57
  | VibeStudio/MiniMax-M2-THRIFT | Compressed/Optimized |
58
 
59
- **Evaluation Date: November 7, 2025**
60
 
61
  ## 📊 Results Comparison
62
 
63
- ### 1\) Multiple Choice Q\&A (lm-eval)
64
 
65
  **Overall MMLU Performance**
66
 
67
- | Model | MMLU Overall | Humanities | STEM | Social Sciences | Other |
68
- | :---- | ----: | ----: | ----: | ----: | ----: |
69
- | MiniMax-M2-BF16 | 83.16% | 77.45% | 80.91% | 90.02% | 87.29% |
70
- | MiniMax-M2-THRIFT | 77.72% | 70.14% | 77.61% | 86.84% | 80.27% |
71
- | **Δ (Difference)** | **\-5.44%** | **\-7.31%** | **\-3.30%** | **\-3.18%** | **\-7.02%** |
72
 
73
  **Individual Task Performance**
74
 
75
- | Task | BF16 (Base) | THRIFT-BF16 | Difference |
76
- | :---- | ----: | ----: | ----: |
77
- | arc\_challenge | 73.21% | 61.01% | \-12.20% ⬇️ |
78
- | arc\_easy | 88.30% | 83.08% | \-5.22% ⬇️ |
79
- | boolq | 87.95% | 84.95% | \-3.00% ⬇️ |
80
- | hellaswag | 83.00% | 77.09% | \-5.91% ⬇️ |
81
- | mmlu | 83.16% | 77.72% | \-5.44% ⬇️ |
82
- | openbookqa | 48.60% | 43.00% | \-5.60% ⬇️ |
83
- | rte | 75.45% | 80.14% | **\+4.69% ⬆️** |
84
- | winogrande | 76.48% | 74.90% | \-1.58% ⬇️ |
85
 
86
- **Average Accuracy Drop: \-4.28%**
87
 
88
- ### 2\) Code Generation (EvalPlus)
89
 
90
- **MBPP Results**
91
 
92
- | Model | MBPP (base) | MBPP+ (extended) |
93
- | :---- | ----: | ----: |
94
- | MiniMax-M2-BF16 | 73.8% | 64.0% |
95
- | MiniMax-M2-THRIFT | 🔄 Coming Soon | 🔄 Coming Soon |
 
96
 
97
- **HumanEval Results**
98
 
99
- | Model | HumanEval (base) | HumanEval+ (extended) |
100
- | :---- | ----: | ----: |
101
- | MiniMax-M2-BF16 | ✅ Complete | ✅ Complete |
102
- | MiniMax-M2-THRIFT | 🔄 Coming Soon | 🔄 Coming Soon |
 
103
 
104
- ### 3\) Math Benchmarks
105
 
106
  **GSM8K Results**
107
 
108
- | Model | Accuracy | Problems |
109
- | :---- | ----: | ----: |
110
- | MiniMax-M2-BF16 | 92.72% | 1,319 |
111
- | MiniMax-M2-THRIFT | 🔄 Coming Soon | 1,319 |
 
112
 
113
  **MATH-500 Results**
114
 
115
- | Model | Overall | Level 1 | Level 2 | Level 3 | Level 4 | Level 5 |
116
- | :---- | ----: | ----: | ----: | ----: | ----: | ----: |
117
- | MiniMax-M2-BF16 | 87.2% | 90.7% | 95.56% | 82.86% | 85.16% | 85.82% |
118
- | MiniMax-M2-THRIFT | 🔄 Coming Soon | 🔄 | 🔄 | 🔄 | 🔄 | 🔄 |
119
 
120
- ### 4\) LiveCodeBench (Live Coding Problems)
121
 
122
- | Model | pass@1 | Problems | Status |
123
- | :---- | ----: | ----: | :---- |
124
- | **MiniMax-M2-BF16** | **35.71%** | 182 | ✅ Complete |
125
- | **MiniMax-M2-THRIFT** | 🔄 Coming Soon | 182 | Not Started Yet |
 
126
 
127
  ---
128
 
129
- ## 📈 Analysis (Preliminary)
130
-
131
- ### Key Findings
132
-
133
- **MMLU Performance Drop**
134
-
135
- * THRIFT-BF16 shows **\-5.44%** overall MMLU drop
136
- * Largest drop: **arc\_challenge (-12.20%)**
137
- * Smallest drop: **winogrande (-1.58%)**
138
- * **RTE improved by \+4.69%** 🎉
139
 
140
- **Subject-Specific Performance**
141
 
142
- * Best preservation: **Social Sciences (-3.18%)**
143
- * Most degraded: **Other (-7.02%)**
144
- * STEM: **Moderate drop (-3.30%)**
145
 
146
  **Compression Trade-off**
147
 
148
- * THRIFT-BF16 (compressed) vs BF16 (base)
149
- * Average accuracy loss: **\~4–5%**
150
- * Expected for compressed/quantized models
151
 
152
- **MMLU Category Breakdown**
153
 
154
- | Category | BF16 (Base) | THRIFT-BF16 | Difference | Status |
155
- | :---- | ----: | ----: | ----: | :---- |
156
- | High School Government | 97.93% | 94.82% | \-3.11% | ✅ Still Excellent |
157
- | High School Psychology | 95.41% | 93.58% | \-1.83% | ✅ Well Preserved |
158
- | Marketing | 95.73% | 91.88% | \-3.85% | ✅ Good |
159
- | Professional Medicine | 92.28% | 79.78% | \-12.50% | ⚠️ Notable Drop |
160
- | Clinical Knowledge | 92.83% | 85.66% | \-7.17% | ⚠️ Moderate Drop |
161
 
162
  ---
 
163
  ## **sglang Deployment with Python**
164
 
165
  It is recommended to use a virtual environment (such as **venv**, **conda**, or **uv**) to avoid dependency conflicts.
@@ -221,9 +213,10 @@ curl http://localhost:8000/v1/chat/completions \
221
  ]
222
  }'
223
  ```
 
224
  ## Benchmarks
225
 
226
- Coming soon.
227
 
228
  ## Research paper
229
 
@@ -267,7 +260,7 @@ Model conversion and HF Transformers code by @Qubitum at ModelCloud.
267
 
268
  @article{yang2025wanda++,
269
  title = {Wanda++: Pruning Large Language Models via Regional Gradients},
270
- author = {Yang, Yifan and Zhen, Kai and Ganesh, Bhavana and Galstyan, Aram and Huybrechts, Goeric and M{"u}ller, Markus and K{"u}bler, Jonas M. and Swaminathan, Rupak Vignesh and Mouchtaris, Athanasios and Bodapati, Sravan Babu and Susanj, Nathan and Zhang, Zheng and FitzGerald, Jack and Kumar, Abhishek},
271
  journal = {arXiv preprint arXiv:2503.04992},
272
  year = {2025},
273
  eprinttype = {arXiv},
@@ -307,7 +300,7 @@ Model conversion and HF Transformers code by @Qubitum at ModelCloud.
307
 
308
  @article{yang2023wanda,
309
  title = {Wanda: Pruning by Weights and Activation-based Discriminant Analysis},
310
- author = {Yang, Yifan and Ganesh, Bhavana and Galstyan, Aram and Huybrechts, Goeric and M{"u}ller, Markus and Kübler, Jonas M. and Swaminathan, Rupak Vignesh and Mouchtaris, Athanasios and Bodapati, Sravan Babu and Zhang, Zheng and FitzGerald, Jack and Kumar, Abhishek},
311
  journal = {arXiv preprint arXiv:2306.11695},
312
  year = {2023},
313
  eprinttype = {arXiv},
 
13
  - MiniMaxAI/MiniMax-M2
14
  ---
15
 
 
16
  ![Screenshot](https://huggingface.co/VibeStudio/MiniMax-M2-THRIFT/resolve/main/vibe_processed_by_imagy.png)
17
 
18
  # THRIFT — Targeted Reduction for Inference and Fine-Tuning
 
21
 
22
  ## TLDR
23
 
24
+ We, over-caffinated researchers at VibeStud.io wanted to create a 50% pruned version of the SOTA MiniMax M2 that is best suited for local/air-gapped coding. This version we achieved ~25%. A 50% pruned version is under development while a not so sucky team of ours is working on a 50% pruned version of Kimi K2 Thinking.We’re writing the paper and expanding the evaluation set to substantiate the results. Check back later, cheers!
25
 
26
  ## Why it’s useful
27
 
28
+ * **Lower latency:** Snappier responses for interactive apps and chatbots.
29
+ * **Smaller memory footprint:** Runs on cheaper GPUs or with fewer resources per replica.
30
+ * **Higher throughput:** Serve more concurrent users at the same cost.
31
+ * **Deployment-friendly:** Drop-in replacement for the base model in most inference stacks.
32
  * **Adaptable:** Supports light fine-tuning to match your domain and style guidelines.
33
 
34
  ## Intended use
35
 
36
+ * General chat and coding assistance
37
+ * Enterprise assistants with strict latency/VRAM budgets
38
+ * Batch or realtime serving in cloud and on-prem environments
39
  * Edge or cost-sensitive deployments where efficiency matters
40
 
41
  ## When to use it
42
 
43
+ * You’re constrained by GPU memory or need shorter response times
44
+ * You want to increase QPS without scaling infrastructure
45
  * You need a model that is “good enough” for most tasks at a better cost profile
46
 
47
  ---
 
50
 
51
  **Models Under Evaluation**
52
 
53
+ | Model | Type |
54
+ | :--------------------------- | :------------------- |
55
+ | ModelCloud/MiniMax-M2-BF16 | Base Model |
56
  | VibeStudio/MiniMax-M2-THRIFT | Compressed/Optimized |
57
 
58
+ **Evaluation Dates:** November 7–9, 2025
59
 
60
  ## 📊 Results Comparison
61
 
62
+ ### 1) Multiple Choice Q&A (lm-eval)
63
 
64
  **Overall MMLU Performance**
65
 
66
+ | Model | MMLU Overall | Humanities | STEM | Social Sciences | Other |
67
+ | :----------------- | -----------: | ---------: | -----: | --------------: | -----: |
68
+ | MiniMax-M2-BF16 | **83.16%** | 77.45% | 80.91% | **90.02%** | 87.29% |
69
+ | MiniMax-M2-THRIFT | **77.72%** | 70.14% | 77.61% | 86.84% | 80.27% |
70
+ | **Δ (Difference)** | **-5.44%** | -7.31% | -3.30% | -3.18% | -7.02% |
71
 
72
  **Individual Task Performance**
73
 
74
+ | Task | BF16 (Base) | THRIFT-BF16 | Difference |
75
+ | :----------------------- | ----------: | ----------: | ------------: |
76
+ | arc_challenge (acc_norm) | 73.21% | 61.01% | -12.20% ⬇️ |
77
+ | arc_easy | 88.30% | 83.08% | -5.22% ⬇️ |
78
+ | boolq | 87.95% | 84.95% | -3.00% ⬇️ |
79
+ | hellaswag (acc_norm) | 83.00% | 77.09% | -5.91% ⬇️ |
80
+ | mmlu | 83.16% | 77.72% | -5.44% ⬇️ |
81
+ | openbookqa (acc_norm) | 48.60% | 43.00% | -5.60% ⬇️ |
82
+ | rte | 75.45% | **80.14%** | **+4.69% ⬆️** |
83
+ | winogrande | 76.48% | 74.90% | -1.58% ⬇️ |
84
 
85
+ **Average Accuracy Drop:** **-4.28%**
86
 
87
+ ### 2) Code Generation (EvalPlus)
88
 
89
+ **MBPP Results (Python, 378 problems)**
90
 
91
+ | Model | MBPP (base) | MBPP+ (extended) | Average |
92
+ | :----------------- | ----------: | ---------------: | --------: |
93
+ | MiniMax-M2-BF16 | **73.8%** | **64.0%** | 68.9% |
94
+ | MiniMax-M2-THRIFT | **70.1%** | **60.1%** | 65.1% |
95
+ | **Δ (Difference)** | **-3.7%** | **-3.9%** | **-3.8%** |
96
 
97
+ **HumanEval Results (164 problems)**
98
 
99
+ | Model | HumanEval (base) | HumanEval+ (extended) | Average |
100
+ | :----------------- | ---------------: | --------------------: | --------: |
101
+ | MiniMax-M2-BF16 | **72.6%** | **71.3%** | 72.0% |
102
+ | MiniMax-M2-THRIFT | **65.2%** | **63.4%** | 64.3% |
103
+ | **Δ (Difference)** | **-7.4%** | **-7.9%** | **-7.7%** |
104
 
105
+ ### 3) Math Benchmarks
106
 
107
  **GSM8K Results**
108
 
109
+ | Model | Accuracy | Problems | Status |
110
+ | :----------------- | ------------: | -------: | :------------------- |
111
+ | MiniMax-M2-BF16 | **92.72%** | 1,319 | ✅ Complete |
112
+ | MiniMax-M2-THRIFT | **93.25%** | 1,319 | ✅ Complete |
113
+ | **Δ (Difference)** | **+0.53% ⬆️** | - | **THRIFT Better!** ✨ |
114
 
115
  **MATH-500 Results**
116
 
117
+ | Model | Overall | Level 1 | Level 2 | Level 3 | Level 4 | Level 5 | Status |
118
+ | :---------------- | --------: | ------: | ------: | ------: | ------: | ------: | :-------------- |
119
+ | MiniMax-M2-BF16 | **87.2%** | 90.7% | 95.56% | 82.86% | 85.16% | 85.82% | ✅ Complete |
120
+ | MiniMax-M2-THRIFT | 🔄 | 🔄 | 🔄 | 🔄 | 🔄 | 🔄 | **In Progress** |
121
 
122
+ ### 4) LiveCodeBench (Live Coding Problems)
123
 
124
+ | Model | pass@1 | Problems | Status |
125
+ | :-------------------- | ------------: | -------: | :------------------- |
126
+ | **MiniMax-M2-BF16** | **35.71%** | 182 | ✅ Complete |
127
+ | **MiniMax-M2-THRIFT** | **36.81%** | 182 | Complete |
128
+ | **Δ (Difference)** | **+1.10% ⬆️** | - | **THRIFT Better!** ✨ |
129
 
130
  ---
131
 
132
+ ## 📈 Analysis (Updated)
 
 
 
 
 
 
 
 
 
133
 
134
+ **Highlights**
135
 
136
+ * **THRIFT wins** on **GSM8K (+0.53%)** and **LiveCodeBench (+1.10%)**, and on **RTE (+4.69%)**.
137
+ * **BF16 leads** on broad **MMLU**, **HumanEval**, **MBPP**, and tasks like **arc_challenge**.
 
138
 
139
  **Compression Trade-off**
140
 
141
+ * Average knowledge-task drop for THRIFT is ~**4–5%**, with **math preserved or slightly improved**.
 
 
142
 
143
+ **Subject Breakdown (MMLU)**
144
 
145
+ | Category | BF16 (Base) | THRIFT-BF16 | Difference | Status |
146
+ | :--------------------- | ----------: | ----------: | ---------: | :---------------- |
147
+ | High School Government | 97.93% | 94.82% | -3.11% | ✅ Still Excellent |
148
+ | High School Psychology | 95.41% | 93.58% | -1.83% | ✅ Well Preserved |
149
+ | Marketing | 95.73% | 91.88% | -3.85% | ✅ Good |
150
+ | Professional Medicine | 92.28% | 79.78% | -12.50% | ⚠️ Notable Drop |
151
+ | Clinical Knowledge | 92.83% | 85.66% | -7.17% | ⚠️ Moderate Drop |
152
 
153
  ---
154
+
155
  ## **sglang Deployment with Python**
156
 
157
  It is recommended to use a virtual environment (such as **venv**, **conda**, or **uv**) to avoid dependency conflicts.
 
213
  ]
214
  }'
215
  ```
216
+
217
  ## Benchmarks
218
 
219
+ See the tables above for the latest **MMLU**, **MBPP**, **HumanEval**, **GSM8K**, **MATH-500**, and **LiveCodeBench** results (updated **November 9, 2025**).
220
 
221
  ## Research paper
222
 
 
260
 
261
  @article{yang2025wanda++,
262
  title = {Wanda++: Pruning Large Language Models via Regional Gradients},
263
+ author = {Yang, Yifan and Zhen, Kai and Ganesh, Bhavana and Galstyan, Aram and Huybrechts, Goeric and Müller, Markus and Kübler, Jonas M. and Swaminathan, Rupak Vignesh and Mouchtaris, Athanasios and Bodapati, Sravan Babu and Susanj, Nathan and Zhang, Zheng and FitzGerald, Jack and Kumar, Abhishek},
264
  journal = {arXiv preprint arXiv:2503.04992},
265
  year = {2025},
266
  eprinttype = {arXiv},
 
300
 
301
  @article{yang2023wanda,
302
  title = {Wanda: Pruning by Weights and Activation-based Discriminant Analysis},
303
+ author = {Yang, Yifan and Ganesh, Bhavana and Galstyan, Aram and Huybrechts, Goeric and Müller, Markus and Kübler, Jonas M. and Swaminathan, Rupak Vignesh and Mouchtaris, Athanasios and Bodapati, Sravan Babu and Zhang, Zheng and FitzGerald, Jack and Kumar, Abhishek},
304
  journal = {arXiv preprint arXiv:2306.11695},
305
  year = {2023},
306
  eprinttype = {arXiv},