MetalRT
Collection
7 items โข Updated โข 2
How to use runanywhere/qwen3_0.6B_MLX_4bit with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir qwen3_0.6B_MLX_4bit runanywhere/qwen3_0.6B_MLX_4bit
Custom MLX 4-bit quantization of Qwen/Qwen3-0.6B optimized for MetalRT GPU inference on Apple Silicon.
Used by RCLI with the MetalRT engine:
rcli setup # select MetalRT or Both engines
| Metric | Value |
|---|---|
| Throughput | 550 tok/s |
| TTFT | 8.9 ms |
| Parameters | 0.6B |
| Quantization | MLX 4-bit |
Model weights: Apache 2.0 (Alibaba Qwen) MetalRT engine: Proprietary (RunAnywhere, Inc.)
Quantized