Instructions to use google/t5gemma-2-4b-4b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use google/t5gemma-2-4b-4b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="google/t5gemma-2-4b-4b")# Load model directly from transformers import AutoProcessor, AutoModelForSeq2SeqLM processor = AutoProcessor.from_pretrained("google/t5gemma-2-4b-4b") model = AutoModelForSeq2SeqLM.from_pretrained("google/t5gemma-2-4b-4b") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use google/t5gemma-2-4b-4b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "google/t5gemma-2-4b-4b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/t5gemma-2-4b-4b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/google/t5gemma-2-4b-4b
- SGLang
How to use google/t5gemma-2-4b-4b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "google/t5gemma-2-4b-4b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/t5gemma-2-4b-4b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "google/t5gemma-2-4b-4b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/t5gemma-2-4b-4b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use google/t5gemma-2-4b-4b with Docker Model Runner:
docker model run hf.co/google/t5gemma-2-4b-4b
Cannot run locally the example code
I have cloned the repo and try to run
´´´python
import requests
from PIL import Image
from transformers import AutoProcessor, AutoModelForSeq2SeqLM
processor = AutoProcessor.from_pretrained("t5gemma-2-4b-4b")
model = AutoModelForSeq2SeqLM.from_pretrained("t5gemma-2-4b-4b")
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bee.jpg"
image = Image.open(requests.get(url, stream=True).raw)
prompt = " in this image, there is"
model_inputs = processor(text=prompt, images=image, return_tensors="pt")
generation = model.generate(**model_inputs, max_new_tokens=20, do_sample=False)
print(processor.decode(generation[0]))
´´´
and have python 3.12 with
- torch 2.9.1
- torchvision 0.24.1
- transformers 4.57.3
- pillow 12.0.0
But I get
´´´bash
python .\example.py
Using a slow image processor as use_fast is unset and a slow processor was saved with this model. use_fast=True will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with use_fast=False.
The tokenizer you are loading from 't5gemma-2-4b-4b' with an incorrect regex pattern: https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503/discussions/84#69121093e8b480e709447d5e. This will lead to incorrect tokenization. You should set the fix_mistral_regex=True flag when loading this tokenizer to fix this issue.
Traceback (most recent call last):
File "C:\Users\test\gemma4t\example.py", line 5, in
processor = AutoProcessor.from_pretrained("t5gemma-2-4b-4b")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\test\gemma\venv\Lib\site-packages\transformers\models\auto\processing_auto.py", line 396, in from_pretrained
return processor_class.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\test\gemma\venv\Lib\site-packages\transformers\processing_utils.py", line 1396, in from_pretrained
return cls.from_args_and_dict(args, processor_dict, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\test\gemma\venv\Lib\site-packages\transformers\processing_utils.py", line 1197, in from_args_and_dict
processor = cls(*args, **valid_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\test\gemma\venv\Lib\site-packages\transformers\models\gemma3\processing_gemma3.py", line 67, in init
self.image_token_id = tokenizer.image_token_id
^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\test\gemma\venv\Lib\site-packages\transformers\tokenization_utils_base.py", line 1128, in getattr
raise AttributeError(f"{self.class.name} has no attribute {key}")
AttributeError: GemmaTokenizerFast has no attribute image_token_id
´´´
What are the versions and settings to run?
I am getting similar error when trying to run gemma-3-12b-itwith new LTX-2 model in Comfy, upgraded transformers to 5.0.0.dev0 but still can't load gemma.