--- language: - en library_name: mlx tags: - minimax - MOE - pruning - compression - reap - cerebras - code - function-calling - mlx license: apache-2.0 pipeline_tag: text-generation base_model: 0xSero/MiniMax-M2.1-REAP-50 --- # MiniMax REAP-50 MLX 4-bit This is a 4-bit quantized version of the MiniMax REAP-50 model optimized for Apple Silicon using MLX. ## Quantization Details - **Quantization**: 4-bit - **Format**: MLX SafeTensors - **Optimization**: Apple Silicon (M-series chips) ## Usage ### Python ```python from mlx_lm import load, generate model, tokenizer = load("minimax-reap50-mlx-4bit") response = generate( model, tokenizer, prompt="Write a function to calculate fibonacci numbers", max_tokens=500, verbose=True ) print(response) ``` ### mlx.server Start the server: ```bash mlx_lm.server --model minimax-reap50-mlx-4bit --port 8080 ``` Make requests: ```bash curl -X POST http://localhost:8080/v1/completions \ -H "Content-Type: application/json" \ -d '{ "model": "default_model", "prompt": "Write a function to calculate fibonacci numbers", "max_tokens": 500 }' ``` Or use the chat endpoint: ```bash curl -X POST http://localhost:8080/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "default_model", "messages": [ {"role": "user", "content": "Write a function to calculate fibonacci numbers"} ], "max_tokens": 500 }' ``` ## Trade-offs - **Memory**: Lowest memory footprint (~65.5 GB) - **Quality**: Acceptable quality with minor degradation - **Speed**: Fastest inference speed