Instructions to use google/gemma-2b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use google/gemma-2b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="google/gemma-2b")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b")
model = AutoModelForCausalLM.from_pretrained("google/gemma-2b")

llama-cpp-python

How to use google/gemma-2b with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="google/gemma-2b",
	filename="gemma-2b.gguf",
)

output = llm(
	"Once upon a time,",
	max_tokens=512,
	echo=True
)
print(output)

Inference
Local Apps Settings

llama.cpp

How to use google/gemma-2b with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf google/gemma-2b
# Run inference directly in the terminal:
llama cli -hf google/gemma-2b

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf google/gemma-2b
# Run inference directly in the terminal:
llama cli -hf google/gemma-2b

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf google/gemma-2b
# Run inference directly in the terminal:
./llama-cli -hf google/gemma-2b

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf google/gemma-2b
# Run inference directly in the terminal:
./build/bin/llama-cli -hf google/gemma-2b

Use Docker

docker model run hf.co/google/gemma-2b

LM Studio
Jan

vLLM

How to use google/gemma-2b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "google/gemma-2b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "google/gemma-2b",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/google/gemma-2b

SGLang

How to use google/gemma-2b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "google/gemma-2b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "google/gemma-2b",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "google/gemma-2b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "google/gemma-2b",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Ollama
How to use google/gemma-2b with Ollama:
```
ollama run hf.co/google/gemma-2b
```

Unsloth Studio

How to use google/gemma-2b with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for google/gemma-2b to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for google/gemma-2b to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for google/gemma-2b to start chatting

Atomic Chat new
Docker Model Runner
How to use google/gemma-2b with Docker Model Runner:
```
docker model run hf.co/google/gemma-2b
```

Lemonade

How to use google/gemma-2b with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull google/gemma-2b

Run and chat with the model

lemonade run user.gemma-2b-{{QUANT_TAG}}

List all available models

lemonade list

GemmaTokenizer does not exist or is not currently imported

#17

by fallenlu - opened Feb 23, 2024

Discussion

fallenlu

Feb 23, 2024

ValueError: Tokenizer class GemmaTokenizer does not exist or is not currently imported.

from transformers import AutoTokenizer, AutoModelForCausalLM
from huggingface_hub import login
#from huggingface_hub import snapshot_download
#snapshot_download(repo_id="google/gemma-2b")


login("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")
model_path = "/Users/macbook/.cache/huggingface/hub/models--google--gemma-2b/snapshots/9d067f00def958594aaa16b39a65b07d69ca655b"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path)

input_text = "Write me a poem about Machine Learning."
input_ids = tokenizer(input_text, return_tensors="pt")

outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))

saocristovia121

Feb 23, 2024

same error for me as well

ValueError Traceback (most recent call last)
in <cell line: 4>()
2 from transformers import AutoTokenizer, AutoModelForCausalLM
3
----> 4 tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b")
5 model = AutoModelForCausalLM.from_pretrained("google/gemma-2b", device_map="auto")
6

/usr/local/lib/python3.10/dist-packages/transformers/models/auto/tokenization_auto.py in from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs)
809 tokenizer_class.register_for_auto_class()
810 return tokenizer_class.from_pretrained(
--> 811 pretrained_model_name_or_path, *inputs, trust_remote_code=trust_remote_code, **kwargs
812 )
813 elif config_tokenizer_class is not None:

ValueError: Tokenizer class GemmaTokenizer does not exist or is not currently imported.

yingfengOnHuggingFace

Feb 23, 2024

reinstall transformers

pip uninstall transformers

pip install transformers

ybelkada

Feb 23, 2024

pip install -U transformers should solve the issue. If not, make sure to create a fresh new env

llmbg

Feb 23, 2024

tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b", use_auth_token=True)
model = AutoModelForCausalLM.from_pretrained("google/gemma-2b", device_map="auto")

''''''''
tokenizer not imported in this and below error
already updated tokenizer but got same error

ValueError: Tokenizer class GemmaTokenizer does not exist or is not currently imported.

osanseviero

Google org Feb 23, 2024

Hi there! Did you upgrade transformers by doing pip install -U transformer?

llmbg

Feb 26, 2024

did but got the same error as below:

ValueError Traceback (most recent call last)
in <cell line: 5>()
3
4 # tokenizer = GemmaTokenizer.from_preset("google/gemma-2b", use_auth_token=True)
----> 5 tokenizer = AutoTokenizer.from_pretrained("google/gemma-7b", use_auth_token=True)
6 model = AutoModelForCausalLM.from_pretrained("google/gemma-2b", device_map="auto")

/usr/local/lib/python3.10/dist-packages/transformers/models/auto/tokenization_auto.py in from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs)
697 Will be passed to the Tokenizer __init__() method. Can be used to set special tokens like
698 bos_token, eos_token, unk_token, sep_token, pad_token, cls_token, mask_token,
--> 699 additional_special_tokens. See parameters in the __init__() for more details.
700
701 Examples:

ValueError: Tokenizer class GemmaTokenizer does not exist or is not currently imported.

I am running in google colab, which has python 3.10

llmbg

Feb 26, 2024

issue resolved, after updating python from 3.10 to 3.11

divnwork

Feb 26, 2024

Hi, just restart the session again but before that,
remove any pip install transformers command and replace it with below line.
!pip -q install git+https://github.com/huggingface/transformers.git

Install other packages like trl or peft seperately using !pip install trl peft
It started working for me..

FlameSub

Feb 27, 2024

it worked for me too! thanks

sparshhhhs

Feb 28, 2024

it worked for me too! thanks

Hey! Can you please share your colab code? I'm getting multiple errors, it will be helpful thanks

WQW

Mar 1, 2024

reinstall transformers

pip uninstall transformers

pip install transformers

After reinstallation, the versation of transformers still keep same

transformers-4.30.2
# python
Python 3.7.3

ben-epstein

Mar 1, 2024

@WQW You need to upgrade it

pip install --upgrade transformers

A simple install will used the cached version of your already installed package

osanseviero

Google org Mar 3, 2024

Colab comes with an older transformers version pre-installed. If you install/upgrade the last version, you will need to restart the runtime so it picks the newly installed version

WQW

Mar 3, 2024

@WQW You need to upgrade it
pip install --upgrade transformers
A simple install will used the cached version of your already installed package

I've already updated to 4.38

mmh7

Mar 5, 2024

pip install transformers==4.38.2 and it works

osanseviero changed discussion status to closed Apr 10, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment