Instructions to use google/gemma-2-27b-it with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use google/gemma-2-27b-it with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="google/gemma-2-27b-it")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-27b-it")
model = AutoModelForCausalLM.from_pretrained("google/gemma-2-27b-it")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Local Apps Settings

vLLM

How to use google/gemma-2-27b-it with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "google/gemma-2-27b-it"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "google/gemma-2-27b-it",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/google/gemma-2-27b-it

SGLang

How to use google/gemma-2-27b-it with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "google/gemma-2-27b-it" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "google/gemma-2-27b-it",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "google/gemma-2-27b-it" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "google/gemma-2-27b-it",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use google/gemma-2-27b-it with Docker Model Runner:
```
docker model run hf.co/google/gemma-2-27b-it
```

What is the difference between "google/gemma-2-27b-it" and "google/gemma-2-27b models"

#38

by GeniusMind - opened Sep 28, 2024

Discussion

GeniusMind

Sep 28, 2024

Hi.
what is the difference between "google/gemma-2-27b-it" and "google/gemma-2-27b models". I cant fond info about this.

Renu11

Google org Sep 30, 2024

•

edited Sep 30, 2024

The Gemma model was released in two main variants: a pre-trained model and an instruction-tuned model with it's different weight sizes. Pre-trained models are also known as base models and do not have the 'it' suffix with it's name("google/gemma-2-27b"). Whereas Instruction-tuned models will have the 'it' suffix with it's name("google/gemma-2-27b-it").

The difference between Pre-trained models(base) and Instruction tuned models(it):
Pre-trained models are general purpose models, trained on large amount of data and can be adapted to various tasks. But these models will have different performance or output quality for the specific tasks. Where it comes to use the instruction tuned models - Instruction tuned models are trained to follow the instructions and generate more quality text. Instruction tuned models can be fine-tuned with domain-specific data for specific use-cases to have better performance with required features and good output quality.

You can also refer to this similar issue for your reference.

sabermalek

Apr 14, 2025

@Renu11 can you please give some examples?

xujfcn

Feb 24

This comment has been hidden (marked as Spam)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment