Instructions to use google/gemma-4-31B-it with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use google/gemma-4-31B-it with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="google/gemma-4-31B-it")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("google/gemma-4-31B-it")
model = AutoModelForMultimodalLM.from_pretrained("google/gemma-4-31B-it", device_map="auto")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
HuggingChat
Notebooks
Google Colab
Kaggle
AMD Developer Cloud
Local Apps Settings

vLLM

How to use google/gemma-4-31B-it with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "google/gemma-4-31B-it"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "google/gemma-4-31B-it",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/google/gemma-4-31B-it

SGLang

How to use google/gemma-4-31B-it with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "google/gemma-4-31B-it" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "google/gemma-4-31B-it",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "google/gemma-4-31B-it" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "google/gemma-4-31B-it",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use google/gemma-4-31B-it with Docker Model Runner:
```
docker model run hf.co/google/gemma-4-31B-it
```

Chat Template has a bug.

#62

by Reithan - opened Apr 13

Discussion

Reithan

Apr 13

Line 215 opens if message['role'] != 'tool', but the endif for this line isn't closed until line 337.

Meaning, the entire message is skipped for any 'tool' message.

The prior chat template did not have this issue. Please revert.

Reithan

Apr 13

Wait... it looks like the forward can takes care of this... but the indentation is still totally wack.

Reithan

Apr 13

OK, here's the actual bug. I figured it out:

This logic is so scuffed. They script has ns.prev_message_type which is just tracked in-loop for THIS message (not prev), to determine closing.
And prev_nt.role which is calulated each loop by searching backwards, to determine the opening.

This leads to a specific edge-case bug if you have 2 back-to-back assistant messages, but the first message has content and/or no tool responses.

In this case, on line 334, it will be ns_tr_out.flag and not message.get('content') as false, which with hit the elif not... and add a <turn|>, but the next message being an assistant message still will not open a new turn. So you'll end up with a turn with no opening <|turn> tag.

sanjitchitturi

Apr 13

•

edited Apr 13

I think your second finding is the actual bug.

The problem seems to be that turn opening and turn closing are computed from different state:

prev_nt.role is found by searching backward
ns.prev_message_type / ns_tr_out.flag is tracked in-loop

That makes the logic asymmetric. In the consecutive assistant-message edge case, the template can emit a transition that assumes a turn is already open, but the next assistant message never gets its opening <|turn|> tag.

So this looks less like an indentation issue and more like inconsistent turn-state management. The safest fix would be to derive both open/close behavior from one shared state machine and add tests for consecutive assistant/tool transitions.

pannaga10

Google org Apr 17

Hi @Reithan @sanjitchitturi
Thanks for reporting this. I was able to reproduce the issue for this particular edge case and have shared it with the engineering team for further review.

Reithan

Apr 18

Here's a version of the jinja with the bug resolved. I've been using it for several days with no further issues.
https://gist.github.com/Reithan/a7431dc0c0b239688a24087bb25c0002

lgusm

Google org 8 days ago

we fixed the chat template last week with a bunch of improvements.
I think this one was fixed too, do you still see any issues?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment