FastVLM-0.5B โ€” MLX 4-bit

A pre-converted, fully self-contained MLX version of Apple's FastVLM-0.5B vision-language model, quantized to 4-bit precision.

~530 MB total โ€” vision tower (FastViTHD) + LLM (Qwen2-0.5B) + multi-modal projector, all stored as MLX safetensors.

Why this exists

The official apple/FastVLM-0.5B ships a hybrid CoreML + MLX architecture: the vision tower is distributed as a .mlpackage CoreML file, while only the LLM weights are in safetensors. The mlx_vlm library (v0.3.11+) reimplemented the FastViTHD vision tower in pure MLX but expects all weights in safetensors format.

This repository was created by running mlx_vlm.convert on the original Apple checkpoint to produce a model that works out-of-the-box with mlx_vlm.load() โ€” no CoreML dependency, no manual weight surgery.

Model details

Property Value
Base model apple/FastVLM-0.5B
Vision encoder FastViTHD (pure MLX)
Language model Qwen2-0.5B
Quantization 4-bit (via mlx_vlm.convert)
Total size ~530 MB
Format MLX safetensors
Load time ~0.6 s on Apple Silicon
Inference ~0.1 s per image, ~183 tok/s
Platform Apple Silicon Macs (M1/M2/M3/M4)

Conversion

This model was produced with a single command:

python -m mlx_vlm.convert     --hf-path apple/FastVLM-0.5B     --mlx-path FastVLM-0.5B-MLX-4bit     -q

Usage

Requires mlx_vlm (v0.3.11 or later).

from mlx_vlm import load, generate
from mlx_vlm.utils import load_config
from mlx_vlm.prompt_utils import apply_chat_template

# Load model and processor (downloads ~530 MB on first run)
model, processor = load("leonardoventurini/FastVLM-0.5B-MLX-4bit")
config = load_config("leonardoventurini/FastVLM-0.5B-MLX-4bit")

# Build a chat-template prompt with one image
prompt = apply_chat_template(
    processor, config, "Describe this image.", num_images=1
)

# Generate โ€” pass any PIL Image
result = generate(model, processor, prompt, image=your_pil_image, max_tokens=100)
print(result)

CLI

python -m mlx_vlm.generate     --model leonardoventurini/FastVLM-0.5B-MLX-4bit     --image path/to/image.png     --prompt "What is happening in this image?"     --max-tokens 100

License

Same license as the original Apple model. See apple/FastVLM-0.5B for full terms.

Downloads last month
102
Safetensors
Model size
0.2B params
Tensor type
BF16
ยท
U32
ยท
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for leonardoventurini/FastVLM-0.5B-MLX-4bit

Quantized
(4)
this model