Instructions to use google/gemma-3-1b-it with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use google/gemma-3-1b-it with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="google/gemma-3-1b-it")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("google/gemma-3-1b-it")
model = AutoModelForCausalLM.from_pretrained("google/gemma-3-1b-it")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use google/gemma-3-1b-it with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "google/gemma-3-1b-it"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "google/gemma-3-1b-it",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/google/gemma-3-1b-it

SGLang

How to use google/gemma-3-1b-it with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "google/gemma-3-1b-it" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "google/gemma-3-1b-it",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "google/gemma-3-1b-it" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "google/gemma-3-1b-it",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use google/gemma-3-1b-it with Docker Model Runner:
```
docker model run hf.co/google/gemma-3-1b-it
```

Remove development branch of transformers

#11

by farzadab - opened Mar 13, 2025

base: refs/heads/main

←

from: refs/pr/11

Discussion Files changed

-1

farzadab

Mar 13, 2025

•

edited Mar 13, 2025

The development branch of transformers for Gemma 3 (v4.49.0-Gemma-3) is not required anymore and in fact this branch is quite buggy. I suggest removing the branch requirement.

Remove development branch of transformersa6aa90f0

farzadab

Mar 13, 2025

In fact it took me a long time to try and weed through the bugs until I realized that main branch of transformers already has Gemma 3 in working condition.

timg8710

Mar 14, 2025

How were you able to use gemma 3 w the main branch of transformers?

farzadab

Mar 16, 2025

•

edited Mar 16, 2025

Actually maybe not completely. I'm not interested in the image modality part so I haven't explored that, but for text, I could get both the 1B and the 4B models working.
Even the text part causes trouble for me on the Gemma3 branch (not mentioning bugs caused for other models).

This is a bit hacky because 1) the tokenizer was not being loaded and 2) I had trouble with the image-text-to-text pipeline, but the normal text-generation pipeline works:

import torch
import transformers

use_4b_model = True

pipe = transformers.pipeline("text-generation", model="google/gemma-3-1b-it", device="cuda", torch_dtype=torch.bfloat16)
pipe.tokenizer = transformers.AutoTokenizer.from_pretrained("google/gemma-3-1b-it")

if use_4b_model:
    model = transformers.AutoModelForPreTraining.from_pretrained("google/gemma-3-4b-it", device_map="cuda", torch_dtype=torch.bfloat16)
    pipe.model = model.language_model

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Write a poem on Hugging Face, the company"}
]

output = pipe(messages, max_new_tokens=50)
print(output[0]["generated_text"][-1]['content'])
# outputs:
# Okay, here's a poem about Hugging Face, aiming to capture its spirit and impact:
#
# **The Open Embrace**
#
# In realms of code, a vibrant hue,
# Hugging Face emerges, fresh and new.
# Not just a

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Cannot merge

This branch has merge conflicts in the following files:

README.md

· Sign up or log in to comment