0xSero commited on
Commit
0e84f60
·
verified ·
1 Parent(s): 6e40e5d

Initial: Professional model card

Browse files
Files changed (1) hide show
  1. README.md +115 -0
README.md ADDED
@@ -0,0 +1,115 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: MiniMaxAI/MiniMax-M2.1
4
+ tags:
5
+ - minimax
6
+ - moe
7
+ - reap
8
+ - pruned
9
+ - text-generation
10
+ library_name: transformers
11
+ pipeline_tag: text-generation
12
+ ---
13
+
14
+ # MiniMax-M2.1-REAP-30
15
+
16
+ **30% expert-pruned MiniMax-M2.1 using REAP (Router-weighted Expert Activation Pruning)**
17
+
18
+ | Property | Value |
19
+ |----------|-------|
20
+ | Base Model | [MiniMaxAI/MiniMax-M2.1](https://huggingface.co/MiniMaxAI/MiniMax-M2.1) |
21
+ | Parameters | ~162B |
22
+ | Experts | 180/256 (70% retained) |
23
+ | Architecture | MoE (Mixture of Experts) |
24
+ | Precision | BF16 |
25
+ | VRAM Required | ~324GB |
26
+ | Stability | **0 loops** in stress tests |
27
+
28
+ ## Stress Test Results
29
+
30
+ Tested at 4 temperatures (0.0, 0.2, 0.7, 1.0) across 6 prompt types (24 total tests):
31
+
32
+ | Temperature | math_word | reasoning | code | json | instruction | creative |
33
+ |-------------|-----------|-----------|------|------|-------------|----------|
34
+ | 0.0 | OK | OK | OK | OK | OK | OK |
35
+ | 0.2 | OK | OK | OK | OK | OK | OK |
36
+ | 0.7 | OK | OK | OK | OK | OK | OK |
37
+ | 1.0 | OK | OK | OK | OK | OK | OK |
38
+
39
+ **Result: 24/24 tests passed, 0 loops detected**
40
+
41
+ ## Extended High-Temperature Testing
42
+
43
+ Additional tests at temperatures 0.5, 0.8, 0.9, 1.2 (results in `stress_test_results.json`).
44
+
45
+ ## Usage
46
+
47
+ ```python
48
+ from transformers import AutoModelForCausalLM, AutoTokenizer
49
+ import torch
50
+
51
+ model = AutoModelForCausalLM.from_pretrained(
52
+ "0xSero/MiniMax-M2.1-REAP-30",
53
+ torch_dtype=torch.bfloat16,
54
+ device_map="auto",
55
+ trust_remote_code=True,
56
+ )
57
+ tokenizer = AutoTokenizer.from_pretrained(
58
+ "0xSero/MiniMax-M2.1-REAP-30",
59
+ trust_remote_code=True,
60
+ )
61
+
62
+ messages = [{"role": "user", "content": "Explain quantum computing in simple terms."}]
63
+ text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
64
+ inputs = tokenizer(text, return_tensors="pt").to(model.device)
65
+
66
+ outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7, do_sample=True)
67
+ response = tokenizer.decode(outputs[0], skip_special_tokens=True)
68
+ print(response)
69
+ ```
70
+
71
+ ## DynamicCache Compatibility Fix (transformers 4.55+)
72
+
73
+ If you encounter `TypeError: CacheLayerMixin.__init__() got an unexpected keyword argument`, add this before importing the model:
74
+
75
+ ```python
76
+ from transformers import cache_utils
77
+ _orig = cache_utils.DynamicCache.__init__
78
+ def _patched(self, *args, **kwargs):
79
+ cfg = kwargs.get("config")
80
+ if cfg and hasattr(cfg, "model_type") and "minimax" in str(getattr(cfg, "model_type", "")):
81
+ kwargs.pop("config", None)
82
+ kwargs.pop("max_cache_len", None)
83
+ kwargs.pop("max_batch_size", None)
84
+ return _orig(self, None)
85
+ return _orig(self, *args, **kwargs)
86
+ cache_utils.DynamicCache.__init__ = _patched
87
+ ```
88
+
89
+ ## Model Comparison
90
+
91
+ | Model | Experts | Loops | Size | Status |
92
+ |-------|---------|-------|------|--------|
93
+ | [MiniMax-M2.1-REAP-20](https://huggingface.co/0xSero/MiniMax-M2.1-REAP-20-REPAIR-IN-PROGRESS) | 204 | 1 | 185B | Deprecated |
94
+ | **MiniMax-M2.1-REAP-30** | **180** | **0** | **162B** | **Recommended** |
95
+ | [MiniMax-M2.1-REAP-40](https://huggingface.co/0xSero/MiniMax-M2.1-REAP-40) | 154 | 0 | 139B | Recommended |
96
+ | [MiniMax-M2.1-REAP-50](https://huggingface.co/0xSero/MiniMax-M2.1-REAP-50-REPAIR-IN-PROGRESS) | 128 | 2 | 116B | Deprecated |
97
+
98
+ ## Quantized Versions
99
+
100
+ - **MiniMax-M2.1-REAP-40-W4A16** (Coming Soon) - 4-bit weights, ~58GB
101
+
102
+ ## REAP Methodology
103
+
104
+ REAP (Router-weighted Expert Activation Pruning) uses calibration data to identify which experts are most important based on router activation patterns. Unlike random or magnitude-based pruning, REAP preserves the experts that are actually used during inference.
105
+
106
+ **Calibration Dataset**: 2098 samples
107
+ - pile-10k: 498 samples (general text)
108
+ - evol-codealpaca: 800 samples (code generation)
109
+ - xlam-function-calling: 800 samples (function calling)
110
+
111
+ ## Acknowledgments
112
+
113
+ - Sponsored by [Prime Intellect](https://www.primeintellect.ai/)
114
+ - REAP implementation by [Cerebras](https://github.com/Cerebras/reap)
115
+ - Base model by [MiniMax](https://huggingface.co/MiniMaxAI)