Instructions to use google/gemma-3-1b-pt with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use google/gemma-3-1b-pt with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="google/gemma-3-1b-pt")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("google/gemma-3-1b-pt") model = AutoModelForCausalLM.from_pretrained("google/gemma-3-1b-pt") - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use google/gemma-3-1b-pt with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "google/gemma-3-1b-pt" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/gemma-3-1b-pt", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/google/gemma-3-1b-pt
- SGLang
How to use google/gemma-3-1b-pt with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "google/gemma-3-1b-pt" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/gemma-3-1b-pt", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "google/gemma-3-1b-pt" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/gemma-3-1b-pt", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use google/gemma-3-1b-pt with Docker Model Runner:
docker model run hf.co/google/gemma-3-1b-pt
When evaluating Wiki2, I just get Loss: Nan, while with gemma-3-1b-it it works..
Why doesn't it work for the -pt version? Can someone help?
model = AutoModelForCausalLM.from_pretrained(
args.path,
torch_dtype=getattr(torch, args.torch_dtype.split('.')[-1]),
trust_remote_code=True,
).to("cuda" if torch.cuda.is_available() else "cpu").eval()
with torch.no_grad():
outputs = model(input_ids, labels=target_ids)
loss = outputs.loss
if torch.isnan(loss):
print(f"NaN loss at i={i}, begin={begin}, end={end}")
continue
Hi @jonny-vr ,
Welcome to the Google Gemma family of open-source models. The primary distinction between the pre-trained (pt) and instruction-tuned (it) models lies in their training objectives. Pre-trained models are trained on general information from sources such as Wikipedia and books, etc, whereas instruction-tuned models undergo further training specifically to adhere to instructions.
I have executed both the pre-trained and instruction-tuned models locally and evaluated their loss values. Both the models are producing the numeric loss values. Please find the attached gist file for you reference. I have tested with normal sample ids available in the sample example code.
Key points to consider:
- Kindly verify that the parameters and arguments, particularly the data type, being passed to the model are correct. The use of unsupported data types can lead to incorrect loading of model weights, resulting in erroneous outputs.
- The issue may stem from the input and label IDs provided to the model.
If you required any further help reach out to me, I'm more than happy to help you out.
Thanks.