Instructions to use google/gemma-3-4b-pt with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use google/gemma-3-4b-pt with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="google/gemma-3-4b-pt")# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("google/gemma-3-4b-pt") model = AutoModelForImageTextToText.from_pretrained("google/gemma-3-4b-pt") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use google/gemma-3-4b-pt with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "google/gemma-3-4b-pt" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/gemma-3-4b-pt", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/google/gemma-3-4b-pt
- SGLang
How to use google/gemma-3-4b-pt with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "google/gemma-3-4b-pt" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/gemma-3-4b-pt", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "google/gemma-3-4b-pt" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/gemma-3-4b-pt", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use google/gemma-3-4b-pt with Docker Model Runner:
docker model run hf.co/google/gemma-3-4b-pt
Wrong configs
There are a few issues loading the Gemma3 model with AutoModelForCausalLM. The core problem is that the current config.json is set up for multi-modal usage (with "text_config" and "vision_config") but is missing key text fields at the top level (like "vocab_size" and "hidden_size") that the text-only classes look for. Specifically:
• There is no "vocab_size" field, yet the checkpoint’s embedding matrix is sized [262208, hidden_size] (because it has extra tokens for images).
• The text fields are nested under "text_config", but Gemma3ForCausalLM expects them at the top level (like config.hidden_size, config.num_hidden_layers, etc.).
• The uploaded config references "Gemma3ForConditionalGeneration", implying multi-modal usage. But for text-only usage, we must patch the config ourselves to match the real embedding dimension and top-level text fields.
Potential fixes:
1. Add text fields at the top level (e.g. "hidden_size": 2560, "vocab_size": 262208, etc.) so that AutoModelForCausalLM can read them directly without error.
2. Use a multi-modal class such as Gemma3ForConditionalGeneration that explicitly handles both text_config and vision_config if that’s the intended usage.
Fixing this manually shows that the model should load fine if this is addressed:
import torch
from transformers import (
AutoConfig,
AutoTokenizer,
pipeline
)
from transformers.models.gemma3.configuration_gemma3 import Gemma3TextConfig
from transformers.models.gemma3.modeling_gemma3 import Gemma3ForCausalLM
# Name or local path of the Gemma3 model checkpoint
model_name = "google/gemma-3-4b-pt"
# Load the multi-modal config
multi_config = AutoConfig.from_pretrained(model_name)
# Extract the text-specific config to a dict
text_cfg_dict = multi_config.text_config.to_dict()
# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Ensure the vocab size matches the checkpoint's embedding shape
# (the checkpoint has embed_tokens.weight of size [262208, 2560], so we set 262208).
text_cfg_dict["vocab_size"] = 262208
# Add any special token IDs from the tokenizer
if tokenizer.pad_token_id is not None:
text_cfg_dict["pad_token_id"] = tokenizer.pad_token_id
text_cfg_dict["bos_token_id"] = tokenizer.bos_token_id
text_cfg_dict["eos_token_id"] = tokenizer.eos_token_id
# Build a text-only config
text_config = Gemma3TextConfig(**text_cfg_dict)
# Load the model using that text config
model = Gemma3ForCausalLM.from_pretrained(
model_name,
config=text_config,
torch_dtype=torch.bfloat16,
device_map=None,
low_cpu_mem_usage=False,
)
# Create a text-generation pipeline
pipe = pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
device=0 if torch.cuda.is_available() else -1
)
prompt = "Eiffel tower is located in"
output = pipe(prompt, max_new_tokens=50)
print("Generated text:", output[0]["generated_text"])
```
Same problem here
Thank you for providing the issue detail. To help us investigate, Could you please let us know which Transformers version you were using when you encountered this error?
We can confirm that this issue has been addressed and resolved in Transformers 4.53.0.
Please try again by installing the latest transformers version (4.53.0) using !pip install -U transformersand you can load the gemma-3-4b-pt model using following code-
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("google/gemma-3-4b-pt")
Please let us know if the issue still persists. Thank you.
I think it works with transformers==4.52.4, (tried this with lora training of Llamafactory efb13b7483249ca2cc0149d7b99d33ae0ba514b8 )
That's good support. I can now load gemma3 models using AutoModelForCausalLM.