[Appreciation] Incredible performance of Gemma 4-26b on consumer hardware — 90 t/s even on an older DDR3 system!

#17
by MightyLoraLord - opened

Hi Google team,

I wanted to share my experience and express my deep gratitude for the amazing work you've done with Gemma 4.

I previously thought my RTX 5060ti (16GB VRAM) had hit its limit running Qwen 3.5-35b-a3b at 45 t/s. However, Gemma 4-26b-a4b (using iq4-xs quantization) has completely exceeded my expectations:

Speed at 2048 context: 80~90 t/s
Speed at 96K context: Still maintains a highly usable 40+ t/s
Intelligence: It feels as smart as, if not smarter than, Qwen 3.5-35b-a3b. Remarkably, it achieves this level of reasoning with much more concise Chain-of-Thought (CoT) and significantly lower token overhead.
A crucial note on my setup:
I am actually running this on a fairly aged platform: an Intel i7-4790 paired with DDR3 1600MHz RAM. Given that my memory bandwidth is quite limited by this older DDR3 standard, the fact that I can achieve these speeds is mind-blowing.

I am convinced that users with modern DDR4 or DDR5 systems will see even more breathtaking performance once they optimize their setups. This is a true testament to how efficiently Gemma 4 is engineered.

Thank you for making such a powerful and accessible model. You have truly leveled up the capabilities of local LLMs!

Keep up the great work!

Hi @MightyLoraLord -

Thank you for sharing this feedback. We really appreciate it! It’s great to see Gemma 4 26B performing so well even on older setups. Your insights on speed, context scaling, and efficiency are very valuable.

I love the base model, and I'm really impressed with its intelligence and speed.
Looking forward to DavidAU's Gemma4 modifications too.

Sign up or log in to comment