Running Qwen3-VL-2B-Instruct on real security camera feeds — impressive results at IQ2 quantization

#10

by SharpAI - opened 8 days ago

Sharing some real-world results from running this model on live security camera footage via SharpAI Aegis + llama-server.

Setup: UD-IQ2_M quantization (0.7 GB) + mmproj-F16 (781 MB) on MacBook Air M3 24GB.

Input: A Blink battery camera mount at front door.

Output: "A mailman is delivering mail to a suburban house. The mailman is wearing a blue uniform and carrying a white mail bag. The house is white with a brown roof, and there's a driveway with a black car parked in front. The mailman is walking on a brick path surrounded by green bushes and trees."

For a 2B model at aggressive quantization, the scene comprehension is remarkably detailed — it correctly identifies the person's role (mailman), clothing, objects, the environment, and spatial relationships.

This is being used in a real product for continuous security camera analysis. The model runs comfortably on a Mac Mini with 8 GB RAM alongside other system tasks.

Great work on the GGUF conversion — the Unsloth chat template fixes are appreciated!

App: https://www.sharpai.org (free, Mac/Windows/Linux)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment