LFM2.5-VL-1.6B is my daily driver for security camera analysis β 51 tokens/sec with full Metal GPU acceleration, and it just works
Wanted to share some real-world production results from running LFM2.5-VL-1.6B on live security camera feeds daily for months via SharpAI Aegis + llama-server.
Setup: Q8_0 quantization (1.2 GB) + mmproj-Q8_0 (556 MB) on Apple Silicon M3.
Input: A Blink battery camera mounted at front door.
Output: "A mailman is delivering mail to a suburban house. The mailman is wearing a blue uniform and carrying a white mail bag. The house is white with a brown roof, and there's a driveway with a black car parked in front. The mailman is walking on a brick path surrounded by green bushes and trees."
Performance numbers:
- ~51.8 predicted tokens/sec
- ~99% Apple M3 GPU during inference (Metal Active), ~2.3 GB GPU memory
- Total disk footprint: 1.7 GB
This has been my go-to VLM for continuous security monitoring. The combination of speed, small size, and consistent output quality at Q8_0 makes it ideal for always-on applications where you need reliable scene descriptions without burning compute.
Excellent work by the LiquidAI team β this model punches well above its weight class.
App: https://www.sharpai.org (free, Mac/Windows/Linux)