Instructions to use google/gemma-2-2b-it with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use google/gemma-2-2b-it with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="google/gemma-2-2b-it")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-2b-it")
model = AutoModelForCausalLM.from_pretrained("google/gemma-2-2b-it")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Local Apps Settings

vLLM

How to use google/gemma-2-2b-it with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "google/gemma-2-2b-it"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "google/gemma-2-2b-it",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/google/gemma-2-2b-it

SGLang

How to use google/gemma-2-2b-it with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "google/gemma-2-2b-it" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "google/gemma-2-2b-it",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "google/gemma-2-2b-it" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "google/gemma-2-2b-it",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use google/gemma-2-2b-it with Docker Model Runner:
```
docker model run hf.co/google/gemma-2-2b-it
```

Can i get the API for this so I can make an app with it?

#21

by riggscodes - opened Aug 6, 2024

Discussion

riggscodes

Aug 6, 2024

Hey all, I am pretty new to the field of AI app development, and I am mainly targeting the method of getting an AI API and implementing it within the app I make. Can I do the same with this particular model?

Xenova

Aug 6, 2024

https://huggingface.co/google/gemma-2-2b-it/discussions/21?inference_api=true should be able to help you out :)

Just note that the free Inference API is subject to rate limits and usage demands. If you need an inference solution for production, check out Inference Endpoints instead.

ytjhai

Aug 6, 2024

Very few models are "plug and play" with inference endpoints and usually need a sample "handler.py" file after cloning the repository for it to work properly. Is this documentation you can provide to support more seamless deployments? Qwen models are the only ones that deploy seamlessly that I've tried.

lkv

Google org Dec 24, 2024

Hi @riggscodes , Sorry for late response. Yes, you can integrate this particular AI model into your app via the OpenAI API. OpenAI provides clear documentation and client libraries to help you get started, and there are plenty of tools to help you manage the integration. Just keep in mind API costs, rate limits, and security when developing and deploying your app.v

Thank you.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment