Instructions to use google/shieldgemma-2-4b-it with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use google/shieldgemma-2-4b-it with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="google/shieldgemma-2-4b-it")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoModelForImageClassification
model = AutoModelForImageClassification.from_pretrained("google/shieldgemma-2-4b-it", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use google/shieldgemma-2-4b-it with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "google/shieldgemma-2-4b-it"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "google/shieldgemma-2-4b-it",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/google/shieldgemma-2-4b-it

SGLang

How to use google/shieldgemma-2-4b-it with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "google/shieldgemma-2-4b-it" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "google/shieldgemma-2-4b-it",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "google/shieldgemma-2-4b-it" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "google/shieldgemma-2-4b-it",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use google/shieldgemma-2-4b-it with Docker Model Runner:
```
docker model run hf.co/google/shieldgemma-2-4b-it
```

Expected Processor Behavior

by fhenkel - opened Dec 29, 2025

Discussion

fhenkel

Dec 29, 2025

•

edited Dec 29, 2025

I have a question regarding the expected output behavior of the processor for ShieldGemma2.
When running a minimal example, similar to what is listed on the model card page, I'm seeing the following

import requests
from PIL import Image
from transformers import AutoProcessor

processor = AutoProcessor.from_pretrained("google/shieldgemma-2-4b-it")

url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bee.jpg"
image = Image.open(requests.get(url, stream=True).raw)

inputs = processor(images=[image], policies=["dangerous"], return_tensors="pt")
print(processor.batch_decode(inputs['input_ids'])[0])

Output (image tokens truncated for readability):

<bos><bos><start_of_turn>user


<start_of_image><image_soft_token>...<end_of_image>

<start_of_turn>user
You are a policy expert trying to help determine whether a generated or real image is in
                 violation of the safety policy below. Your decision must very clearly follow the specific
                 language used in the safety policy and should not include any interpretation outside of the
                 safety policy.


                <BEGIN_SAFETY_POLICY>

                No Dangerous Content: The image shall not contain content that facilitates or encourages activities that could cause real-world harm (e.g., building firearms and explosive devices, promotion of terrorism, instructions for suicide).

                </END_SAFETY_POLICY>


                Does the image violate the above policy? Your Answer must start with 'Yes' or 'No'.
                <end_of_turn>
<start_of_turn>model

There are two <bos> tokens, and the first <start_of_turn> token is not closed, but a second one is opened after <end_of_image>. Is this behavior expected and was the model originally trained with this exact template?

pannaga10

Google org Jan 18

Hi @fhenkel Apologies for late response .
In short yes both behaviors are expected and model was trained with this template .
There are 2 tokens inserted because one of them is added by tokenizer and another based of Shield Gemma chat template which includes the {{ bos_token }} variable.
Coming to the token non closure - even though it looks like two user turns, the model does not interpret it that way. The image is treated as a prefix payload to the subsequent user text, and the turn is only closed once appears. The second user acts as a continuation marker rather than a reset.

The model is interpreting this way
[BOS][BOS]
USER:
[image tokens]
[policy + instructions]
MODEL:
Yes / No

If you have any more questions , please let us know
Thanks

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment