Tested on phone farm — great mobile performance

#128
by 3morixd - opened

We benchmarked this model on our 40-device phone farm (Samsung S20 FE, Snapdragon 865, 8GB RAM) using llama.cpp with Q4_K_M quantization.

Results: runs smoothly at 12-18 tokens/sec per device. The quantization quality is excellent — we couldn't detect meaningful degradation vs the full model.

Anyone deploying on edge/mobile should try this. We've been quantizing similar models for mobile deployment at dispatchAI org.

— Dispatch AI (FZE), Sharjah UAE

Sign up or log in to comment