Instructions to use google/gemma-3-27b-it with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use google/gemma-3-27b-it with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="google/gemma-3-27b-it")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("google/gemma-3-27b-it")
model = AutoModelForImageTextToText.from_pretrained("google/gemma-3-27b-it")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
HuggingChat
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use google/gemma-3-27b-it with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "google/gemma-3-27b-it"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "google/gemma-3-27b-it",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/google/gemma-3-27b-it

SGLang

How to use google/gemma-3-27b-it with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "google/gemma-3-27b-it" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "google/gemma-3-27b-it",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "google/gemma-3-27b-it" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "google/gemma-3-27b-it",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use google/gemma-3-27b-it with Docker Model Runner:
```
docker model run hf.co/google/gemma-3-27b-it
```

Performance evaluation of Gemma 3-27b-it with different quantization methods (4-bit vs 8-bit)

#102

by Ryan1007 - opened Feb 24

Discussion

Ryan1007

Feb 24

Hi team, I'm planning to deploy Gemma 3-27b-it on a consumer GPU with limited VRAM. I've noticed some performance variations when using 4-bit quantization (bitsandbytes). Have you guys performed any benchmarks on how much the reasoning capability drops compared to the FP16 version? Any recommended quantization parameters for maintaining logical consistency?

pannaga10

Google org Feb 27

Hi @Ryan1007
Google has not published an official benchmark table specifically comparing bitsandbytes to the FP16/BF16 base model for Gemma 3 27b-it. However, you can refer the community-led benchmarks available on Reddit . I have included links to thes benchmarks below for your reference.
https://www.reddit.com/r/LocalLLaMA/comments/1k6nrl1/i_benchmarked_the_gemma_3_27b_qat_models/
https://www.reddit.com/r/LocalLLaMA/comments/1k3jal4/gemma_3_qat_versus_other_q4_quants/

To maintain logical consistency you can start with parameters like NF4 quantization and turning on double quantization, while keeping the compute dtype in FP16. In practice that means setting bnb_4bit_quant_type="nf4", bnb_4bit_use_double_quant=True and bnb_4bit_compute_dtype=torch.float16 . We’ve generally seen NF4 hold up better than plain linear 4-bit, especially around outlier weights and double quant helps recover a bit more fidelity, which translates into more stable reasoning. Please let me know if this setup helps you .

Thanks

bugwei

Mar 3

Hi @pannaga10 .
Just to confirm, are you using FP16 or BF16? I thought GDM usually defaults to BF16.

Hi @Ryan1007 ,
Google has released the QAT version (https://huggingface.co/collections/google/gemma-3-qat).
I expect these QAT models to offer more stable performance after quantization, compared to directly quantizing the original 27b-it model.
I'm not entirely sure if my understanding is correct, but I'm currently using google/gemma-3-12b-it-qat-int4-unquantized together with bitsandbytes.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment