Instructions to use google/gemma-2-2b-it with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use google/gemma-2-2b-it with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="google/gemma-2-2b-it")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-2b-it")
model = AutoModelForCausalLM.from_pretrained("google/gemma-2-2b-it", device_map="auto")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use google/gemma-2-2b-it with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "google/gemma-2-2b-it"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "google/gemma-2-2b-it",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/google/gemma-2-2b-it

SGLang

How to use google/gemma-2-2b-it with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "google/gemma-2-2b-it" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "google/gemma-2-2b-it",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "google/gemma-2-2b-it" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "google/gemma-2-2b-it",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use google/gemma-2-2b-it with Docker Model Runner:
```
docker model run hf.co/google/gemma-2-2b-it
```

[solved] error downloading model!

#11

by ZeroWw - opened Jul 31, 2024

Discussion

ZeroWw

Jul 31, 2024

On huggingface I read: Gated model
You have been granted access to this model

And I can download it on huggingface website.

But if I use huggingface-cli (using the same account)

I get:


Fetching 12 files:   0% 0/12 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_errors.py", line 304, in hf_raise_for_status
    response.raise_for_status()
  File "/usr/local/lib/python3.10/dist-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/google/gemma-2b-it/resolve/4cf79afa15bef73c0b98ff5937d8e57d6071ef71/.gitattributes

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/bin/huggingface-cli", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/commands/huggingface_cli.py", line 51, in main
    service.run()
  File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/commands/download.py", line 146, in run
    print(self._download())  # Print path to downloaded files
  File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/commands/download.py", line 180, in _download
    return snapshot_download(
  File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/_snapshot_download.py", line 294, in snapshot_download
    thread_map(
  File "/usr/local/lib/python3.10/dist-packages/tqdm/contrib/concurrent.py", line 69, in thread_map
    return _executor_map(ThreadPoolExecutor, fn, *iterables, **tqdm_kwargs)
  File "/usr/local/lib/python3.10/dist-packages/tqdm/contrib/concurrent.py", line 51, in _executor_map
    return list(tqdm_class(ex.map(fn, *iterables, chunksize=chunksize), **kwargs))
  File "/usr/local/lib/python3.10/dist-packages/tqdm/std.py", line 1181, in __iter__
    for obj in iterable:
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 621, in result_iterator
    yield _result_or_cancel(fs.pop())
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 319, in _result_or_cancel
    return fut.result(timeout)
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 458, in result
    return self.__get_result()
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/_snapshot_download.py", line 268, in _inner_hf_hub_download
    return hf_hub_download(
  File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py", line 1202, in hf_hub_download
    return _hf_hub_download_to_local_dir(
  File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py", line 1440, in _hf_hub_download_to_local_dir
    _raise_on_head_call_error(head_call_error, force_download, local_files_only)
  File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py", line 1823, in _raise_on_head_call_error
    raise head_call_error
  File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py", line 1722, in _get_metadata_or_catch_error
    metadata = get_hf_file_metadata(url=url, proxies=proxies, timeout=etag_timeout, headers=headers)
  File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py", line 1645, in get_hf_file_metadata
    r = _request_wrapper(
  File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py", line 372, in _request_wrapper
    response = _request_wrapper(
  File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py", line 396, in _request_wrapper
    hf_raise_for_status(response)
  File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_errors.py", line 321, in hf_raise_for_status
    raise GatedRepoError(message, response) from e
huggingface_hub.utils._errors.GatedRepoError: 401 Client Error. (Request ID: Root=1-66aaa8f7-66fb512034a03a9b63412dae;745fcb6f-2cf7-4e61-a5c3-4929c5167f21)

Cannot access gated repo for url https://huggingface.co/google/gemma-2b-it/resolve/4cf79afa15bef73c0b98ff5937d8e57d6071ef71/.gitattributes.
Access to model google/gemma-2b-it is restricted. You must be authenticated to access it.

Xenova

Jul 31, 2024

•

edited Jul 31, 2024

Hi there! Can you confirm that you have logged in with an access token of yours?

huggingface-cli login

ZeroWw

Jul 31, 2024

Hi there! Can you confirm that you have logged in with an access token of yours?

yes.

ZeroWw

Jul 31, 2024

I just tried again, even created 2 new access tokens. Same as before. It seems a problem with huggingface-cli or something else (on gated repos)
I just tested with gemma-7b (which worked before) and I have the same output.
Hmm same output with mistral too...
it seems a huggingface problem!

Xenova

Jul 31, 2024

•

edited Jul 31, 2024

I am unable to reproduce, meaning I believe this is an issue with your environment. Can you try with Google colab (or a python venv)?

Also, are you trying to download google/gemma-2-2b-it or google/gemma-2b-it?

tokenprovider

Aug 1, 2024

works fine for me

tokenprovider

Aug 1, 2024

guys how to use this in terms like ChatGPT? do i need to find a chat code and use this model as an API ? Also can it remember previous conversation?

wyklq

Aug 1, 2024

You need an API framework, e.g. ollama or vLLM.

ZeroWw

Aug 1, 2024

•

edited Aug 1, 2024

My bad. An almost invisible typo in a colab notebook caused all this.

ZeroWw changed discussion status to closed Aug 1, 2024

lambodriver

Aug 1, 2024

@wyklq what about LM studio ? can I use it there?

wyklq

Aug 2, 2024

Sorry, no idea about it. Please find it out by yourself.

ZeroWw

Aug 2, 2024

As I said. it was my fault. A typo in my code.
That's why I closed this discussion.
Sorry for the confusion.

ZeroWw changed discussion title from error downloading model! to [solved] error downloading model! Aug 2, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment