FastVLM-0.5B โ MLX 4-bit
A pre-converted, fully self-contained MLX version of Apple's FastVLM-0.5B vision-language model, quantized to 4-bit precision.
~530 MB total โ vision tower (FastViTHD) + LLM (Qwen2-0.5B) + multi-modal projector, all stored as MLX safetensors.
Why this exists
The official apple/FastVLM-0.5B ships a hybrid CoreML + MLX architecture: the vision tower is distributed as a .mlpackage CoreML file, while only the LLM weights are in safetensors. The mlx_vlm library (v0.3.11+) reimplemented the FastViTHD vision tower in pure MLX but expects all weights in safetensors format.
This repository was created by running mlx_vlm.convert on the original Apple checkpoint to produce a model that works out-of-the-box with mlx_vlm.load() โ no CoreML dependency, no manual weight surgery.
Model details
| Property | Value |
|---|---|
| Base model | apple/FastVLM-0.5B |
| Vision encoder | FastViTHD (pure MLX) |
| Language model | Qwen2-0.5B |
| Quantization | 4-bit (via mlx_vlm.convert) |
| Total size | ~530 MB |
| Format | MLX safetensors |
| Load time | ~0.6 s on Apple Silicon |
| Inference | ~0.1 s per image, ~183 tok/s |
| Platform | Apple Silicon Macs (M1/M2/M3/M4) |
Conversion
This model was produced with a single command:
python -m mlx_vlm.convert --hf-path apple/FastVLM-0.5B --mlx-path FastVLM-0.5B-MLX-4bit -q
Usage
Requires mlx_vlm (v0.3.11 or later).
from mlx_vlm import load, generate
from mlx_vlm.utils import load_config
from mlx_vlm.prompt_utils import apply_chat_template
# Load model and processor (downloads ~530 MB on first run)
model, processor = load("leonardoventurini/FastVLM-0.5B-MLX-4bit")
config = load_config("leonardoventurini/FastVLM-0.5B-MLX-4bit")
# Build a chat-template prompt with one image
prompt = apply_chat_template(
processor, config, "Describe this image.", num_images=1
)
# Generate โ pass any PIL Image
result = generate(model, processor, prompt, image=your_pil_image, max_tokens=100)
print(result)
CLI
python -m mlx_vlm.generate --model leonardoventurini/FastVLM-0.5B-MLX-4bit --image path/to/image.png --prompt "What is happening in this image?" --max-tokens 100
License
Same license as the original Apple model. See apple/FastVLM-0.5B for full terms.
- Downloads last month
- 102
4-bit
Model tree for leonardoventurini/FastVLM-0.5B-MLX-4bit
Base model
apple/FastVLM-0.5B