Instructions to use google/shieldgemma-2-4b-it with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use google/shieldgemma-2-4b-it with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="google/shieldgemma-2-4b-it") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoModelForImageClassification model = AutoModelForImageClassification.from_pretrained("google/shieldgemma-2-4b-it", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use google/shieldgemma-2-4b-it with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "google/shieldgemma-2-4b-it" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/shieldgemma-2-4b-it", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/google/shieldgemma-2-4b-it
- SGLang
How to use google/shieldgemma-2-4b-it with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "google/shieldgemma-2-4b-it" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/shieldgemma-2-4b-it", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "google/shieldgemma-2-4b-it" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/shieldgemma-2-4b-it", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use google/shieldgemma-2-4b-it with Docker Model Runner:
docker model run hf.co/google/shieldgemma-2-4b-it
Expected Processor Behavior
I have a question regarding the expected output behavior of the processor for ShieldGemma2.
When running a minimal example, similar to what is listed on the model card page, I'm seeing the following
import requests
from PIL import Image
from transformers import AutoProcessor
processor = AutoProcessor.from_pretrained("google/shieldgemma-2-4b-it")
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bee.jpg"
image = Image.open(requests.get(url, stream=True).raw)
inputs = processor(images=[image], policies=["dangerous"], return_tensors="pt")
print(processor.batch_decode(inputs['input_ids'])[0])
Output (image tokens truncated for readability):
<bos><bos><start_of_turn>user
<start_of_image><image_soft_token>...<end_of_image>
<start_of_turn>user
You are a policy expert trying to help determine whether a generated or real image is in
violation of the safety policy below. Your decision must very clearly follow the specific
language used in the safety policy and should not include any interpretation outside of the
safety policy.
<BEGIN_SAFETY_POLICY>
No Dangerous Content: The image shall not contain content that facilitates or encourages activities that could cause real-world harm (e.g., building firearms and explosive devices, promotion of terrorism, instructions for suicide).
</END_SAFETY_POLICY>
Does the image violate the above policy? Your Answer must start with 'Yes' or 'No'.
<end_of_turn>
<start_of_turn>model
There are two <bos> tokens, and the first <start_of_turn> token is not closed, but a second one is opened after <end_of_image>. Is this behavior expected and was the model originally trained with this exact template?
Hi @fhenkel Apologies for late response .
In short yes both behaviors are expected and model was trained with this template .
There are 2 tokens inserted because one of them is added by tokenizer and another based of Shield Gemma chat template which includes the {{ bos_token }} variable.
Coming to the token non closure - even though it looks like two user turns, the model does not interpret it that way. The image is treated as a prefix payload to the subsequent user text, and the turn is only closed once appears. The second user acts as a continuation marker rather than a reset.
The model is interpreting this way
[BOS][BOS]
USER:
[image tokens]
[policy + instructions]
MODEL:
Yes / No
If you have any more questions , please let us know
Thanks