README.md · AlexGS74/MiniMax-M2.1-REAP-50-mlx-4bit at main

File size: 1,608 Bytes

---
language:
- en
library_name: mlx
tags:
- minimax
- MOE
- pruning
- compression
- reap
- cerebras
- code
- function-calling
- mlx
license: apache-2.0
pipeline_tag: text-generation
base_model: 0xSero/MiniMax-M2.1-REAP-50
---

# MiniMax REAP-50 MLX 4-bit

This is a 4-bit quantized version of the MiniMax REAP-50 model optimized for Apple Silicon using MLX.

## Quantization Details
- **Quantization**: 4-bit
- **Format**: MLX SafeTensors
- **Optimization**: Apple Silicon (M-series chips)

## Usage

### Python

```python
from mlx_lm import load, generate

model, tokenizer = load("minimax-reap50-mlx-4bit")

response = generate(
    model, 
    tokenizer, 
    prompt="Write a function to calculate fibonacci numbers",
    max_tokens=500,
    verbose=True
)
print(response)
```

### mlx.server

Start the server:

```bash
mlx_lm.server --model minimax-reap50-mlx-4bit --port 8080
```

Make requests:

```bash
curl -X POST http://localhost:8080/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "default_model",
    "prompt": "Write a function to calculate fibonacci numbers",
    "max_tokens": 500
  }'
```

Or use the chat endpoint:

```bash
curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "default_model",
    "messages": [
      {"role": "user", "content": "Write a function to calculate fibonacci numbers"}
    ],
    "max_tokens": 500
  }'
```

## Trade-offs
- **Memory**: Lowest memory footprint (~65.5 GB)
- **Quality**: Acceptable quality with minor degradation
- **Speed**: Fastest inference speed