Instructions to use google/gemma-1.1-7b-it with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use google/gemma-1.1-7b-it with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="google/gemma-1.1-7b-it")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("google/gemma-1.1-7b-it")
model = AutoModelForCausalLM.from_pretrained("google/gemma-1.1-7b-it")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use google/gemma-1.1-7b-it with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "google/gemma-1.1-7b-it"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "google/gemma-1.1-7b-it",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/google/gemma-1.1-7b-it

SGLang

How to use google/gemma-1.1-7b-it with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "google/gemma-1.1-7b-it" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "google/gemma-1.1-7b-it",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "google/gemma-1.1-7b-it" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "google/gemma-1.1-7b-it",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use google/gemma-1.1-7b-it with Docker Model Runner:
```
docker model run hf.co/google/gemma-1.1-7b-it
```

Do I need to apply_chat_template before Supervised Fine-tuning Gemma-1.1-7b-it?

#19

by Syax19 - opened Apr 22, 2024

Discussion

Syax19

Apr 22, 2024

•

edited Apr 23, 2024

I'm a novice in training LLM for the first time and would greatly appreciate any assistance.

I noticed that in the "example_sft_qlora.py" script, there's a formatting function defined as:

def formatting_func(example):
    text = f"### USER: {example['data'][0]}\n### ASSISTANT: {example['data'][1]}"
    return text

I don't see any mention of applying apply_chat_template as using Gemma-1.1-7b-it model for inference.
Is it because supervised fine-tuning doesn't require using the original template?
Will my custom template which is like in the formatting_func overwrite original template during training, or do I need to modify the formatting_func to apply the original chat_template?

Looking forward for replying.
Thanks!

lkv

Google org Jul 16, 2024

Hi @Syax19 , Supervised Fine tuning doesn't utilize the original chat template during inference. Hence, the custom template in formatting_func will not overwrite the original template. You can use your custom template for training without the need to modify formatting_func. If you want to apply the original chat template, you would need to modify the formatting_func to include the template. Alternatively, you can use the apply_chat_template utility provided by Gemma-1.1-7b-it to apply the template during inference.

Syax19

Jul 16, 2024

Thanks for your help!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment