Qwen3 0.6B โ€” MLX 4-bit Quantized

Custom MLX 4-bit quantization of Qwen/Qwen3-0.6B optimized for MetalRT GPU inference on Apple Silicon.

Usage

Used by RCLI with the MetalRT engine:

rcli setup          # select MetalRT or Both engines

Performance (Apple M3 Max)

Metric Value
Throughput 550 tok/s
TTFT 8.9 ms
Parameters 0.6B
Quantization MLX 4-bit

License

Model weights: Apache 2.0 (Alibaba Qwen) MetalRT engine: Proprietary (RunAnywhere, Inc.)

Contact

founder@runanywhere.ai | https://runanywhere.ai

Downloads last month
18
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Collection including runanywhere/qwen3_0.6B_MLX_4bit