Instructions to use google/gemma-4-31B-it with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use google/gemma-4-31B-it with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="google/gemma-4-31B-it")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("google/gemma-4-31B-it")
model = AutoModelForImageTextToText.from_pretrained("google/gemma-4-31B-it")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
HuggingChat
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use google/gemma-4-31B-it with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "google/gemma-4-31B-it"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "google/gemma-4-31B-it",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/google/gemma-4-31B-it

SGLang

How to use google/gemma-4-31B-it with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "google/gemma-4-31B-it" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "google/gemma-4-31B-it",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "google/gemma-4-31B-it" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "google/gemma-4-31B-it",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use google/gemma-4-31B-it with Docker Model Runner:
```
docker model run hf.co/google/gemma-4-31B-it
```

Possible chat_template.jinja issue: nullable $ref tool schemas are rendered as empty types

#87

by sigjhl - opened 19 days ago

Discussion

sigjhl

19 days ago

I think there may be a bug in the Gemma chat template's tool declaration rendering.

I tested this through llama.cpp/llama-server with unsloth/gemma-4-31B-it-GGUF, but the issue looks like it comes from the model chat template itself rather than from llama.cpp specifically.

The short version: tool parameters using the common JSON Schema pattern anyOf: [$ref, null] are rendered into the prompt as empty type fields. This strips the useful schema information before the model sees it.

Example schema shape

Many typed tool systems generate schemas like this for nullable structured fields:

{
  "type": "object",
  "properties": {
    "some_assertion_field": {
      "anyOf": [
        {"$ref": "#/$defs/FeatureAssertion"},
        {"type": "null"}
      ],
      "default": null
    }
  },
  "$defs": {
    "FeatureAssertion": {
      "type": "object",
      "additionalProperties": false,
      "properties": {
        "value": {
          "type": "string",
          "enum": ["positive", "negative", "not_mentioned", "not_assessable"]
        },
        "ref_ids": {
          "type": "array",
          "items": {"type": "string"}
        }
      },
      "required": ["value"]
    }
  }
}

The expected tool argument is a nested object:

{
  "some_assertion_field": {
    "value": "positive",
    "ref_ids": ["s001"]
  }
}

Actual rendered declaration

In the rendered Gemma tool declaration, the field becomes:

some_assertion_field:{type:<|"|><|"|>}

So the model sees the field name, but not:

anyOf
$ref
the FeatureAssertion definition
the required value field
the ref_ids field
the enum values

Observed behavior

With that rendered declaration, the model called the tool with primitive or invented values:

some_assertion_field:true

or after validation feedback:

some_assertion_field:{assertion:true}

This makes sense if the model was never shown the nested object schema.

What seems to cause it

The current/default chat_template.jinja has a format_parameters macro that appears to assume a property has a direct type:

{%- if value['type'] | upper == 'STRING' -%}
  ...
{%- elif value['type'] | upper == 'ARRAY' -%}
  ...
{%- endif -%}

...

type:<|"|>{{ value['type'] | upper }}<|"|>

But for this common schema pattern:

{
  "anyOf": [
    {"$ref": "#/$defs/FeatureAssertion"},
    {"type": "null"}
  ],
  "default": null
}

there is no property-level type. The type information is inside anyOf, and the object shape is in $defs.

The template also does not appear to emit root $defs in format_function_declaration, so even if a field references #/$defs/FeatureAssertion, the definition is not included in the rendered tool declaration.

I also noticed the current template's format_argument does not special-case null/None values, so default: null can be rendered as a blank value in some contexts.

Workaround tested

I made a small local patch to the Gemma Jinja template while keeping the native Gemma format.

The patch:

renders anyOf when present
renders $ref when present
emits root $defs
only emits type: when a direct type exists
renders null/None as null

After this, the rendered declaration preserved the schema:

some_assertion_field:{
  anyOf:[
    {$ref:<|"|>#/$defs/FeatureAssertion<|"|>},
    {type:<|"|>null<|"|>}
  ]
}

defs:{
  FeatureAssertion:{
    properties:{
      value:{
        enum:[
          <|"|>positive<|"|>,
          <|"|>negative<|"|>,
          <|"|>not_mentioned<|"|>,
          <|"|>not_assessable<|"|>
        ],
        type:<|"|>string<|"|>
      },
      ref_ids:{
        items:{type:<|"|>string<|"|>},
        type:<|"|>array<|"|>
      }
    },
    required:[<|"|>value<|"|>],
    type:<|"|>object<|"|>
  }
}

With that patched template, the same model generated the correct nested tool argument:

some_assertion_field:{ref_ids:[<|"|>s001<|"|>],value:<|"|>positive<|"|>}

and the tool call succeeded.

Note on the recent template update

I saw that chat_template.jinja was updated recently, but the current file still appears to have the same issue: format_parameters still primarily branches on value['type'], still emits type:<|"|>{{ value['type'] | upper }}<|"|>, and I do not see handling for property-level anyOf, property-level $ref, or root $defs in the tool declaration.

Suggested fix

Please consider updating the tool declaration rendering in chat_template.jinja to preserve common JSON Schema constructs:

render property-level anyOf
render property-level $ref
include root $defs
avoid emitting type: when type is missing
render null/None as null

This would make Gemma's advertised tool support work better with real-world typed schemas, especially schemas generated by Pydantic and similar libraries.

sigjhl

19 days ago

I just went ahead and made a PR for the template fix, sorry for rambling here

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment