Qwen3-Coder-Next-REAP-48B-A3B-4bit-mlx
This model was converted to MLX 4-bit format from lovedheart/Qwen3-Coder-Next-REAP-48B-A3B-GGUF using mlx-lm version 0.30.6.
Model Specifications
- Type: Causal Language Models
- Number of Parameters: 48B in total and 3B activated
- Number of Layers: 48
- Context Length: 262,144 natively and extensible up to 1,010,000 tokens
- Compression Method: REAP (Router-weighted Expert Activation Pruning)
- Compression Ratio: 40% expert pruning
- Quantization: 4-bit, group size 64
Recommended Inference Settings
| Parameter | Value |
|---|---|
| Temperature | 1.0 |
| Top-K | 40 |
| Top-P | 0.95 |
| Min-P | 0.01 |
| KV Cache Quantization | Off |
Performance
| Hardware | Memory | Speed |
|---|---|---|
| Mac Mini M4 Pro | 64GB | 60+ tokens/s |
Acknowledgements
Thanks to lovedheart for providing the original model.
Usage
pip install mlx-lm
from mlx_lm import load, generate
model, tokenizer = load("lovedheart/Qwen3-Coder-Next-REAP-48B-A3B-4bit-mlx")
response = generate(model, tokenizer, prompt="Hello, who are you?", max_tokens=512)
print(response)
- Downloads last month
- 1,250
Model size
49B params
Tensor type
BF16
·
U32 ·
Hardware compatibility
Log In to add your hardware
4-bit
Model tree for toby1991/Qwen3-Coder-Next-REAP-48B-A3B-4bit-mlx
Base model
Qwen/Qwen3-Coder-Next