Instructions to use google/gemma-4-E2B-it with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use google/gemma-4-E2B-it with Transformers:
# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("google/gemma-4-E2B-it") model = AutoModelForImageTextToText.from_pretrained("google/gemma-4-E2B-it") - Notebooks
- Google Colab
- Kaggle
[Feature request] Eliminate pre-attention RMSNorm in Gemma 4 via scale invariance + weight folding
Due to the scale invariance of RMS, an RMSNorm layer followed by a linear projection followed by another RMSNorm allows the first RMSNorm to be eliminated entirely — a mathematically lossless simplification.
For models that use QKV-normalization (such as Gemma 4), this means the pre-attention RMSNorm can be removed with no change to model outputs, see FlashNorm paper.
However, the pre-attention norm's learned weights are still needed. These can be eliminated cleanly by folding them into the QKV projection weights using the FlashNorm weight-folding trick — again with no loss in model accuracy.
For reference, we have applied this weight folding trick to a few LLMs (Llama, Qwen, SMolLM) here:
https://huggingface.co/models?other=weightless-rmsnorm