LFM2.5-VL-1.6B is my daily driver for security camera analysis β€” 51 tokens/sec with full Metal GPU acceleration, and it just works

#7
by SharpAI - opened

Wanted to share some real-world production results from running LFM2.5-VL-1.6B on live security camera feeds daily for months via SharpAI Aegis + llama-server.

Setup: Q8_0 quantization (1.2 GB) + mmproj-Q8_0 (556 MB) on Apple Silicon M3.

Input: A Blink battery camera mounted at front door.

Output: "A mailman is delivering mail to a suburban house. The mailman is wearing a blue uniform and carrying a white mail bag. The house is white with a brown roof, and there's a driveway with a black car parked in front. The mailman is walking on a brick path surrounded by green bushes and trees."

Performance numbers:

  • ~51.8 predicted tokens/sec
  • ~99% Apple M3 GPU during inference (Metal Active), ~2.3 GB GPU memory
  • Total disk footprint: 1.7 GB

This has been my go-to VLM for continuous security monitoring. The combination of speed, small size, and consistent output quality at Q8_0 makes it ideal for always-on applications where you need reliable scene descriptions without burning compute.

Excellent work by the LiquidAI team β€” this model punches well above its weight class.

App: https://www.sharpai.org (free, Mac/Windows/Linux)

Sign up or log in to comment