Qwen3 4B โ MLX 4-bit Quantized
Custom MLX 4-bit quantization of Qwen/Qwen3-4B optimized for MetalRT GPU inference on Apple Silicon.
Usage
Used by RCLI with the MetalRT engine:
rcli setup # select MetalRT or Both engines
Performance (Apple M3 Max)
| Metric | Value |
|---|---|
| Throughput | 180 tok/s |
| Parameters | 4B |
| Quantization | MLX 4-bit |
License
Model weights: Apache 2.0 (Alibaba Qwen) MetalRT engine: Proprietary (RunAnywhere, Inc.)
Contact
- Downloads last month
- 15
Hardware compatibility
Log In to add your hardware
Quantized
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support