Instructions to use google/gemma-3-1b-it with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use google/gemma-3-1b-it with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="google/gemma-3-1b-it") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("google/gemma-3-1b-it") model = AutoModelForCausalLM.from_pretrained("google/gemma-3-1b-it") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use google/gemma-3-1b-it with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "google/gemma-3-1b-it" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/gemma-3-1b-it", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/google/gemma-3-1b-it
- SGLang
How to use google/gemma-3-1b-it with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "google/gemma-3-1b-it" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/gemma-3-1b-it", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "google/gemma-3-1b-it" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/gemma-3-1b-it", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use google/gemma-3-1b-it with Docker Model Runner:
docker model run hf.co/google/gemma-3-1b-it
What transformers version can this be deployed with?
I tried deploying this model on AWS SageMaker however it seems like the transformers library doesn't have an update to date version to yet to handle gemma 3.
How can this be deployed?
Ok thanks. The issue that I am having though is that in order to deploy on sagemaker, I have to put which transformers version and there hasn't been a new release with Gemma 3 yet. I also tried extending a Deep learning container by installing the (4.50.0.dev0) but ran into some compatibility issues.
What would be the easiest way for me to deploy this on sagemaker?
A new stable version of Transformers is now available which is compatible to Gemma3. Please update it using pip install -U transformers and try again. Let us know if this helps! Thank you
I had similar issue and even the newer version didnt help. It gave me the below error.
Traceback (most recent call last): File "/usr/local/bin/dockerd-entrypoint.py", line 21, in from sagemaker_huggingface_inference_toolkit import serving File "/opt/conda/lib/python3.10/site-packages/sagemaker_huggingface_inference_toolkit/serving.py", line 18, in from sagemaker_huggingface_inference_toolkit import handler_service, mms_model_server File "/opt/conda/lib/python3.10/site-packages/sagemaker_huggingface_inference_toolkit/handler_service.py", line 28, in from sagemaker_huggingface_inference_toolkit.transformers_utils import ( File "/opt/conda/lib/python3.10/site-packages/sagemaker_huggingface_inference_toolkit/transformers_utils.py", line 24, in from transformers.pipelines import Conversation, Pipeline
2025-04-04T19:40:20.455Z
ImportError: cannot import name 'Conversation' from 'transformers.pipelines' (/opt/conda/lib/python3.10/site-packages/transformers/pipelines/init.py)