Instructions to use google/gemma-4-E4B-it with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use google/gemma-4-E4B-it with Transformers:
# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("google/gemma-4-E4B-it") model = AutoModelForImageTextToText.from_pretrained("google/gemma-4-E4B-it") - Notebooks
- Google Colab
- Kaggle
Image Dimension Test Results
Google says you can use any image resolution. I decided to test their claims.
I made 3 instances & tested a few images in the following resolutions utilizing the exact same settings including Seed number:
1080 x 1920 (100%)
810 x 1440 (75%)
540 x 960 (50%)
The 100% & 75% images gave incomplete descriptions, but the images resized to 50% gave a complete description.
Was the image encoder only trained with a maximum resolution of 1024^2?
If anyone else tests Gemma4 this way, please include your results as well.
Hi @Koitenshin , Great observation and thanks for testing this. We have verified the responses with specified three resolutions on gemma-4-E4B-it model. The model provides complete descriptions for all three resolutions . Please refer to this sample gist. However to investigate your findings more clearly could you share a reproducible code along token budget limits and the prompt given. Thank You