AlexGS74's picture
Update README.md
e5d4026 verified
---
language:
- en
library_name: mlx
tags:
- minimax
- MOE
- pruning
- compression
- reap
- cerebras
- code
- function-calling
- mlx
license: apache-2.0
pipeline_tag: text-generation
base_model: 0xSero/MiniMax-M2.1-REAP-50
---
# MiniMax REAP-50 MLX 4-bit
This is a 4-bit quantized version of the MiniMax REAP-50 model optimized for Apple Silicon using MLX.
## Quantization Details
- **Quantization**: 4-bit
- **Format**: MLX SafeTensors
- **Optimization**: Apple Silicon (M-series chips)
## Usage
### Python
```python
from mlx_lm import load, generate
model, tokenizer = load("minimax-reap50-mlx-4bit")
response = generate(
model,
tokenizer,
prompt="Write a function to calculate fibonacci numbers",
max_tokens=500,
verbose=True
)
print(response)
```
### mlx.server
Start the server:
```bash
mlx_lm.server --model minimax-reap50-mlx-4bit --port 8080
```
Make requests:
```bash
curl -X POST http://localhost:8080/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "default_model",
"prompt": "Write a function to calculate fibonacci numbers",
"max_tokens": 500
}'
```
Or use the chat endpoint:
```bash
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "default_model",
"messages": [
{"role": "user", "content": "Write a function to calculate fibonacci numbers"}
],
"max_tokens": 500
}'
```
## Trade-offs
- **Memory**: Lowest memory footprint (~65.5 GB)
- **Quality**: Acceptable quality with minor degradation
- **Speed**: Fastest inference speed