FastVLM-0.5B — MLX 4-bit

A pre-converted, fully self-contained MLX version of Apple's FastVLM-0.5B vision-language model, quantized to 4-bit precision.

~530 MB total — vision tower (FastViTHD) + LLM (Qwen2-0.5B) + multi-modal projector, all stored as MLX safetensors.

Why this exists

The official apple/FastVLM-0.5B ships a hybrid CoreML + MLX architecture: the vision tower is distributed as a .mlpackage CoreML file, while only the LLM weights are in safetensors. The mlx_vlm library (v0.3.11+) reimplemented the FastViTHD vision tower in pure MLX but expects all weights in safetensors format.

This repository was created by running mlx_vlm.convert on the original Apple checkpoint to produce a model that works out-of-the-box with mlx_vlm.load() — no CoreML dependency, no manual weight surgery.

Model details

Property	Value
Base model	apple/FastVLM-0.5B
Vision encoder	FastViTHD (pure MLX)
Language model	Qwen2-0.5B
Quantization	4-bit (via `mlx_vlm.convert`)
Total size	~530 MB
Format	MLX safetensors
Load time	~0.6 s on Apple Silicon
Inference	~0.1 s per image, ~183 tok/s
Platform	Apple Silicon Macs (M1/M2/M3/M4)

Conversion

This model was produced with a single command:

python -m mlx_vlm.convert     --hf-path apple/FastVLM-0.5B     --mlx-path FastVLM-0.5B-MLX-4bit     -q

Usage

Requires mlx_vlm (v0.3.11 or later).

from mlx_vlm import load, generate
from mlx_vlm.utils import load_config
from mlx_vlm.prompt_utils import apply_chat_template

# Load model and processor (downloads ~530 MB on first run)
model, processor = load("leonardoventurini/FastVLM-0.5B-MLX-4bit")
config = load_config("leonardoventurini/FastVLM-0.5B-MLX-4bit")

# Build a chat-template prompt with one image
prompt = apply_chat_template(
    processor, config, "Describe this image.", num_images=1
)

# Generate — pass any PIL Image
result = generate(model, processor, prompt, image=your_pil_image, max_tokens=100)
print(result)

CLI

python -m mlx_vlm.generate     --model leonardoventurini/FastVLM-0.5B-MLX-4bit     --image path/to/image.png     --prompt "What is happening in this image?"     --max-tokens 100

License

Same license as the original Apple model. See apple/FastVLM-0.5B for full terms.

Downloads last month: 102

Safetensors

Model size

0.2B params

Tensor type

BF16

U32

MLX

Hardware compatibility

4-bit

Model tree for leonardoventurini/FastVLM-0.5B-MLX-4bit

Base model

apple/FastVLM-0.5B

Quantized

(4)

this model