Instructions to use google/gemma-4-E4B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use google/gemma-4-E4B with Transformers:
# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("google/gemma-4-E4B") model = AutoModelForImageTextToText.from_pretrained("google/gemma-4-E4B") - Notebooks
- Google Colab
- Kaggle
Chat template to make vLLM deployment working?
#4
by vrdn23 - opened
Hey folks,
Thanks for the model! Is there a chat_template.json/jinja to be added to make it vLLM compatible?
cc @RyanMullins
Currently deployment start-up succeeds, but inference fails with this error
curl -i http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"messages": [
{ "role": "user", "content": "9.11 and 9.8, which is greater?" }
], "max_tokens": 250,"reasoning_effort": "low",
"extra_body": {
"thinking_token_budget": 10
}
}'
HTTP/1.1 400 Bad Request
date: Mon, 11 May 2026 21:59:40 GMT
server: uvicorn
content-length: 216
content-type: application/json
{"error":{"message":"As of transformers v4.44, default chat template is no longer allowed, so you must provide a chat template if the tokenizer does not define one.","type":"BadRequestError","param":null,"code":400}}%
Hello, this is a base model without instruction-tuning. For common chat, I think you should use google/gemma-4-E4B-it instead, and there is a chat_template.jinja file in it.