File size: 1,608 Bytes
29fe8c4 e5d4026 29fe8c4 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 | ---
language:
- en
library_name: mlx
tags:
- minimax
- MOE
- pruning
- compression
- reap
- cerebras
- code
- function-calling
- mlx
license: apache-2.0
pipeline_tag: text-generation
base_model: 0xSero/MiniMax-M2.1-REAP-50
---
# MiniMax REAP-50 MLX 4-bit
This is a 4-bit quantized version of the MiniMax REAP-50 model optimized for Apple Silicon using MLX.
## Quantization Details
- **Quantization**: 4-bit
- **Format**: MLX SafeTensors
- **Optimization**: Apple Silicon (M-series chips)
## Usage
### Python
```python
from mlx_lm import load, generate
model, tokenizer = load("minimax-reap50-mlx-4bit")
response = generate(
model,
tokenizer,
prompt="Write a function to calculate fibonacci numbers",
max_tokens=500,
verbose=True
)
print(response)
```
### mlx.server
Start the server:
```bash
mlx_lm.server --model minimax-reap50-mlx-4bit --port 8080
```
Make requests:
```bash
curl -X POST http://localhost:8080/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "default_model",
"prompt": "Write a function to calculate fibonacci numbers",
"max_tokens": 500
}'
```
Or use the chat endpoint:
```bash
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "default_model",
"messages": [
{"role": "user", "content": "Write a function to calculate fibonacci numbers"}
],
"max_tokens": 500
}'
```
## Trade-offs
- **Memory**: Lowest memory footprint (~65.5 GB)
- **Quality**: Acceptable quality with minor degradation
- **Speed**: Fastest inference speed
|