Gemma-3-1B runs beautifully on Snapdragon 865

#45
by 3morixd - opened

Tested Gemma-3-1B-it on our 40-device phone farm (Samsung S20 FE, Snapdragon 865, 8GB RAM).

The model loads in under 2 seconds and generates at ~15-17 tokens/sec with Q4_K_M quantization via llama.cpp. That's faster than most people type.

What impressed us most: the instruction-following quality at this size is remarkable. It handles multi-turn conversations on-device without any cloud dependency.

We've created a mobile-optimized version: dispatchAI/Gemma-2-2B-IT-mobile

Dispatch AI (FZE) — building mobile AI from Sharjah, UAE.

Sign up or log in to comment