Llama 3.2 3B — MLX 4-bit Quantized

Custom MLX 4-bit quantization of meta-llama/Llama-3.2-3B-Instruct optimized for MetalRT GPU inference on Apple Silicon.

Usage

Used by RCLI with the MetalRT engine:

rcli setup          # select MetalRT or Both engines

Metric	Value
Parameters	3B
Quantization	MLX 4-bit

Model weights: Llama 3.2 Community License (Meta) MetalRT engine: Proprietary (RunAnywhere, Inc.)

MLX

Hardware compatibility

Quantized

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support