Instructions to use google/gemma-4-E4B-it with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use google/gemma-4-E4B-it with Transformers:
# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("google/gemma-4-E4B-it") model = AutoModelForImageTextToText.from_pretrained("google/gemma-4-E4B-it") - Notebooks
- Google Colab
- Kaggle
fix: embed chat_template in tokenizer_config.json
The chat_template field is missing from tokenizer_config.json. The template exists as a separate chat_template.jinja file, but AutoTokenizer.from_pretrained() only reads from tokenizer_config.json. This causes apply_chat_template() to fail in transformers.js and other non-Python tooling.
Gemma 2 and Gemma 3 models include this field correctly. This PR embeds the existing chat_template.jinja content into tokenizer_config.json so tokenizers can find it without needing a separate file loader.
Discovered while building wandler, an OpenAI-compatible inference server powered by transformers.js.
See also: https://huggingface.co/google/gemma-4-E2B-it/discussions/8 (same fix for E2B by @piero-atelico )
I think this fix, like mine, should just be applied everywhere on all models . Although it seems the default now is the jinja approach https://github.com/huggingface/transformers/issues/45205 ut this change is not fully documented and it has been very confusing to a bunch of people (you and me included)