Instructions to use google/t5gemma-9b-9b-ul2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use google/t5gemma-9b-9b-ul2 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="google/t5gemma-9b-9b-ul2")# Load model directly from transformers import AutoTokenizer, AutoModelForSeq2SeqLM tokenizer = AutoTokenizer.from_pretrained("google/t5gemma-9b-9b-ul2") model = AutoModelForSeq2SeqLM.from_pretrained("google/t5gemma-9b-9b-ul2") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use google/t5gemma-9b-9b-ul2 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "google/t5gemma-9b-9b-ul2" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/t5gemma-9b-9b-ul2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/google/t5gemma-9b-9b-ul2
- SGLang
How to use google/t5gemma-9b-9b-ul2 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "google/t5gemma-9b-9b-ul2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/t5gemma-9b-9b-ul2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "google/t5gemma-9b-9b-ul2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/t5gemma-9b-9b-ul2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use google/t5gemma-9b-9b-ul2 with Docker Model Runner:
docker model run hf.co/google/t5gemma-9b-9b-ul2
Issues with preparing inputs for sequence-to-sequence learning
I am training T5Gemma for Word-in-Context binary classification as sentence-to-sentence problem (the same as original T5 paper). However the model is predicting the same label. Initially, I notice that the tokenizer do not add the end-of-string token so I adapted for it into my code, it went from "falsetruetruetruetrue" until reaching maximum tokens. Now, after adding eos, it predicts only true.
PS: The code below works with "google-t5/t5-small"
Any help here? Code below:
from datasets import load_dataset
import numpy as np
from sklearn.metrics import accuracy_score, precision_recall_fscore_support
from transformers import EvalPrediction
from transformers import (
AutoTokenizer,
AutoModelForSeq2SeqLM,
Seq2SeqTrainer,
Seq2SeqTrainingArguments,
)
# Convert to Hugging Face Dataset
dataset = load_dataset("super_glue", "wic")
# Initialize tokenizer and model
model_name = "google/t5gemma-b-b-ul2-it"
# model_name = "google-t5/t5-small"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name, attn_implementation="eager")
def compute_metrics(eval_pred: EvalPrediction):
predictions, labels = eval_pred
# Decode predicted token IDs to strings
pred_str = tokenizer.batch_decode(predictions, skip_special_tokens=True)
labels = np.where(labels != -100, labels, tokenizer.pad_token_id)
label_str = tokenizer.batch_decode(labels, skip_special_tokens=True)
print(pred_str)
print(label_str)
# Convert "true"/"false" strings to 1/0
pred_labels = [1 if p.strip().lower() == "true" else 0 for p in pred_str]
true_labels = [1 if l.strip().lower() == "true" else 0 for l in label_str]
# compute precision, recall, f1
precision, recall, f1_score, _ = precision_recall_fscore_support(
true_labels, pred_labels, average="binary"
)
accuracy = accuracy_score(true_labels, pred_labels)
return {
"accuracy": accuracy,
"precision": precision,
"recall": recall,
"f1_score": f1_score,
}
# Preprocessing function
def preprocess(example):
input_text = f"sentence1: {example['sentence1']} sentence2: {example['sentence2']} word: {example['word']}"
target_text = "true" if example["label"] == 1 else "false"
target_text = target_text + tokenizer.eos_token
# Tokenize inputs and targets
model_inputs = tokenizer(
input_text, max_length=128, truncation=True, padding="max_length"
)
labels = tokenizer(target_text, max_length=5, truncation=True, padding="max_length")
# Replace pad token id's in labels with -100 so they're ignored by loss
labels_ids = labels["input_ids"]
labels_ids = [
label if label != tokenizer.pad_token_id else -100 for label in labels_ids
]
model_inputs["labels"] = labels_ids
return model_inputs
# Tokenize dataset
tokenized_dataset = dataset.map(
preprocess, remove_columns=dataset["train"].column_names
)
# Training arguments
training_args = Seq2SeqTrainingArguments(
output_dir="./t5-wic",
eval_strategy="epoch",
per_device_train_batch_size=32,
num_train_epochs=10,
save_strategy="epoch",
save_total_limit=1,
load_best_model_at_end=True,
metric_for_best_model="accuracy",
predict_with_generate=True,
bf16=True,
)
print(tokenized_dataset["train"][0])
print(tokenizer.decode(tokenized_dataset["train"][0]["input_ids"]))
# remove -100
labels = [
label if label != -100 else tokenizer.pad_token_id
for label in tokenized_dataset["train"][0]["labels"]
]
print(tokenizer.decode(labels))
# Initialize Trainer
trainer = Seq2SeqTrainer(
model=model,
args=training_args,
train_dataset=tokenized_dataset["train"],
eval_dataset=tokenized_dataset["validation"],
compute_metrics=compute_metrics,
)
# Train the model
trainer.train()
metrics = trainer.evaluate(tokenized_dataset["test"])
print("Final metrics:")
print(metrics)
Hi,
Thanks for reaching out to us, welcome to Google's Gemma family of open source models. Please follow the following recommended suggestions:
Step 1 (Must-Do):
Action: Explicitly set a lower learning_rate in Seq2SeqTrainingArguments. Start with 1e-4 or 5e-5.
Rationale: Addresses the numerical instability inherent in large, modern models (T5Gemma) combined with bf16.
Step 2 (Must-Do):
Action: Check the tokenization of the labels.
Rationale: Ensure that tokenizer("true" + tokenizer.eos_token) is short (e.g., 2 or 3 tokens) and correctly tokenizes true or false as distinct tokens.
Step 3 (Optional but good):
**Action:**Add Gradient Clipping to your training arguments to prevent potential explosions in bf16 training.
Rationale: Adds stability. (You may need to add a custom callback or wrap the optimizer, as Seq2SeqTrainingArguments doesn't have a direct max_grad_norm parameter for all trainers.)
Step 3 (Verify):
Action: Temporarily run a validation/test step before training starts to ensure the compute_metrics and generation are working as expected with the initial, untrained model.
Rationale: Isolates the issue: is it in the setup or the training process?
Thanks.
Thanks for the reply.
The default parameters are already in the range you specified, so tweaking them resulted in no change.
The only thing that worked was turning off the bf16 and using full precision.
Arguments and training don't work efficiently
Hi, Apologies for the late reply, could you please confirm whether you require any further assistance or not apart from precision related concerns.
Thanks.
No need, after using full precision it trained fine.