Instructions to use google/gemma-3-12b-it with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use google/gemma-3-12b-it with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="google/gemma-3-12b-it") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("google/gemma-3-12b-it") model = AutoModelForImageTextToText.from_pretrained("google/gemma-3-12b-it") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use google/gemma-3-12b-it with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "google/gemma-3-12b-it" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/gemma-3-12b-it", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/google/gemma-3-12b-it
- SGLang
How to use google/gemma-3-12b-it with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "google/gemma-3-12b-it" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/gemma-3-12b-it", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "google/gemma-3-12b-it" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/gemma-3-12b-it", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use google/gemma-3-12b-it with Docker Model Runner:
docker model run hf.co/google/gemma-3-12b-it
batched tokenization support
1
#34 opened 2 months ago
by
peiyan88
📋 Documentation Enhancement Suggestion
#33 opened 3 months ago
by
CroviaTrust
📋 Documentation Enhancement Suggestion
#32 opened 3 months ago
by
CroviaTrust
📋 Documentation Enhancement Suggestion
2
#31 opened 3 months ago
by
CroviaTrust
Inference Widget (Featherless AI) is broken: severe hallucinations and repetition loops due to missing chat template
5
#30 opened 4 months ago
by
Vldcheeky
init request
#28 opened 9 months ago
by
nabavi
Update README.md
#27 opened 11 months ago
by
dj1507
cache_size_limit reached
1
#26 opened 11 months ago
by
uesenpai
CUDA error: misaligned address
5
#24 opened 12 months ago
by
msi-sbraun-11
Gemma 3 fine tuning max token length
5
#22 opened about 1 year ago
by
mukhayy
Please how to use Gemma to perform inference on an image
1
#20 opened about 1 year ago
by
oraekene
inference API
1
#19 opened about 1 year ago
by
opium80s
Update config.json
👍 1
1
#18 opened about 1 year ago
by
GopiUppari
Lack of `max_position_embeddings` in config.json
1
#17 opened about 1 year ago
by
Zihao-Li
Did anyone else get the impression that the 12b model gives better textual responses than the 27b model?
👍 1
1
#13 opened about 1 year ago
by
Lucena
here I did some tests if anyone is interested and plans to use this model.....
👍 1
5
#12 opened about 1 year ago
by
Sreta
Chat template doesn't include "tool" role
3
#11 opened about 1 year ago
by
EduardZl
Image-to-text (OCR) functionality omits top-most line for recognition / output
5
#8 opened about 1 year ago
by
snowboarder04
[System prompt inside] Poor man's R1 based on Gemma 3
❤️ 6
11
#7 opened about 1 year ago
by
MrDevolver
Support BitsAndBytesConfig
👍 2
1
#6 opened about 1 year ago
by
brand17