qwen3_4B_mlx_4bit / README.md
kcvmk's picture
Upload folder using huggingface_hub
5e9182c verified
metadata
license: apache-2.0
tags:
  - mlx
  - 4bit
  - qwen3
  - metalrt
  - apple-silicon

Qwen3 4B — MLX 4-bit Quantized

Custom MLX 4-bit quantization of Qwen/Qwen3-4B optimized for MetalRT GPU inference on Apple Silicon.

Usage

Used by RCLI with the MetalRT engine:

rcli setup          # select MetalRT or Both engines

Performance (Apple M3 Max)

Metric Value
Throughput 180 tok/s
Parameters 4B
Quantization MLX 4-bit

License

Model weights: Apache 2.0 (Alibaba Qwen) MetalRT engine: Proprietary (RunAnywhere, Inc.)

Contact

founder@runanywhere.ai | https://runanywhere.ai