File size: 5,332 Bytes
190fb90
 
 
 
 
 
 
 
 
 
 
 
af2399c
 
 
 
190fb90
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7414b51
 
 
 
444dfbe
 
 
c46752b
 
 
 
444dfbe
2aaf641
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
---
license: apache-2.0
base_model: MiniMaxAI/MiniMax-M2.1
tags:
- minimax
- moe
- reap
- pruned
- text-generation
library_name: transformers
pipeline_tag: text-generation
---
> [!TIP]
> Support this work: **[donate.sybilsolutions.ai](https://donate.sybilsolutions.ai)**
> 
> REAP surfaces: [GLM](https://huggingface.co/spaces/0xSero/reap-glm-family) | [MiniMax](https://huggingface.co/spaces/0xSero/reap-minimax-family) | [Qwen](https://huggingface.co/spaces/0xSero/reap-qwen-family) | [Gemma](https://huggingface.co/spaces/0xSero/reap-gemma-family) | [Paper](https://arxiv.org/abs/2510.13999) | [Code](https://github.com/CerebrasResearch/reap) | [PR17](https://github.com/CerebrasResearch/reap/pull/17) | [Cerebras Collection](https://huggingface.co/collections/cerebras/cerebras-reap)

# MiniMax-M2.1-REAP-40

**40% expert-pruned MiniMax-M2.1 using REAP (Router-weighted Expert Activation Pruning)**

| Property | Value |
|----------|-------|
| Base Model | [MiniMaxAI/MiniMax-M2.1](https://huggingface.co/MiniMaxAI/MiniMax-M2.1) |
| Parameters | ~139B |
| Experts | 154/256 (60% retained) |
| Architecture | MoE (Mixture of Experts) |
| Precision | BF16 |
| VRAM Required | ~278GB |
| Stability | **0 loops** in stress tests |

## Stress Test Results

Tested at 4 temperatures (0.0, 0.2, 0.7, 1.0) across 6 prompt types (24 total tests):

| Temperature | math_word | reasoning | code | json | instruction | creative |
|-------------|-----------|-----------|------|------|-------------|----------|
| 0.0 | OK | OK | OK | OK | OK | OK |
| 0.2 | OK | OK | OK | OK | OK | OK |
| 0.7 | OK | OK | OK | OK | OK | OK |
| 1.0 | OK | OK | OK | OK | OK | OK |

**Result: 24/24 tests passed, 0 loops detected**

## Usage

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "0xSero/MiniMax-M2.1-REAP-40",
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(
    "0xSero/MiniMax-M2.1-REAP-40",
    trust_remote_code=True,
)

messages = [{"role": "user", "content": "Write a Python function to calculate fibonacci numbers."}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7, do_sample=True)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
```

## DynamicCache Compatibility Fix (transformers 4.55+)

If you encounter `TypeError: CacheLayerMixin.__init__() got an unexpected keyword argument`, add this before importing the model:

```python
from transformers import cache_utils
_orig = cache_utils.DynamicCache.__init__
def _patched(self, *args, **kwargs):
    cfg = kwargs.get("config")
    if cfg and hasattr(cfg, "model_type") and "minimax" in str(getattr(cfg, "model_type", "")):
        kwargs.pop("config", None)
        kwargs.pop("max_cache_len", None)
        kwargs.pop("max_batch_size", None)
        return _orig(self, None)
    return _orig(self, *args, **kwargs)
cache_utils.DynamicCache.__init__ = _patched
```

## Model Comparison

| Model | Experts | Loops | Size | Status |
|-------|---------|-------|------|--------|
| [MiniMax-M2.1-REAP-20](https://huggingface.co/0xSero/MiniMax-M2.1-REAP-20-REPAIR-IN-PROGRESS) | 204 | 1 | 185B | Deprecated |
| [MiniMax-M2.1-REAP-30](https://huggingface.co/0xSero/MiniMax-M2.1-REAP-30) | 180 | 0 | 162B | Recommended |
| **MiniMax-M2.1-REAP-40** | **154** | **0** | **139B** | **Recommended** |
| [MiniMax-M2.1-REAP-50](https://huggingface.co/0xSero/MiniMax-M2.1-REAP-50-REPAIR-IN-PROGRESS) | 128 | 2 | 116B | Deprecated |

## Quantized Versions

- **MiniMax-M2.1-REAP-40-W4A16** (Coming Soon) - 4-bit weights, ~58GB VRAM

## Why 40% Pruning?

The 40% pruning ratio offers the best balance of:
- **Size reduction**: 139B vs 456B original (70% smaller)
- **VRAM savings**: ~278GB vs ~912GB (fits on 4x H100 80GB)
- **Stability**: 0 loops in comprehensive stress testing
- **Performance**: Minimal quality degradation from strategic expert selection

## REAP Methodology

REAP (Router-weighted Expert Activation Pruning) uses calibration data to identify which experts are most important based on router activation patterns. Unlike random or magnitude-based pruning, REAP preserves the experts that are actually used during inference.

**Calibration Dataset**: 2098 samples
- pile-10k: 498 samples (general text)
- evol-codealpaca: 800 samples (code generation)
- xlam-function-calling: 800 samples (function calling)

## Acknowledgments

- Sponsored by [Prime Intellect](https://www.primeintellect.ai/)
- REAP implementation by [Cerebras](https://github.com/Cerebras/reap)
- Base model by [MiniMax](https://huggingface.co/MiniMaxAI)

## Support

If this work is useful, support Sybil Solutions here: [https://donate.sybilsolutions.ai](https://donate.sybilsolutions.ai)


<!-- SERO_MANAGED_TOP_LINKS_START -->
## Support and links
- Donate: https://donate.sybilsolutions.ai
- X: https://x.com/0xsero
- GitHub: https://github.com/0xsero
<!-- SERO_MANAGED_TOP_LINKS_END -->

## Sponsors

Thank you for the kind sponsors, wouldn't be possible without them:

- Nvidia
- TNG Technology
- Lambda
- Prime Intellect
- HotAisle