MiniMax REAP-50 MLX 4-bit
This is a 4-bit quantized version of the MiniMax REAP-50 model optimized for Apple Silicon using MLX.
Quantization Details
- Quantization: 4-bit
- Format: MLX SafeTensors
- Optimization: Apple Silicon (M-series chips)
Usage
Python
from mlx_lm import load, generate
model, tokenizer = load("minimax-reap50-mlx-4bit")
response = generate(
model,
tokenizer,
prompt="Write a function to calculate fibonacci numbers",
max_tokens=500,
verbose=True
)
print(response)
mlx.server
Start the server:
mlx_lm.server --model minimax-reap50-mlx-4bit --port 8080
Make requests:
curl -X POST http://localhost:8080/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "default_model",
"prompt": "Write a function to calculate fibonacci numbers",
"max_tokens": 500
}'
Or use the chat endpoint:
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "default_model",
"messages": [
{"role": "user", "content": "Write a function to calculate fibonacci numbers"}
],
"max_tokens": 500
}'
Trade-offs
- Memory: Lowest memory footprint (~65.5 GB)
- Quality: Acceptable quality with minor degradation
- Speed: Fastest inference speed
- Downloads last month
- 277
Model size
116B params
Tensor type
BF16
·
U32
·
Hardware compatibility
Log In
to view the estimation
4-bit
Model tree for AlexGS74/MiniMax-M2.1-REAP-50-mlx-4bit
Base model
0xSero/MiniMax-M2.1-REAP-50