Instructions to use google/gemma-2-2b-it with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use google/gemma-2-2b-it with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="google/gemma-2-2b-it") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-2b-it") model = AutoModelForCausalLM.from_pretrained("google/gemma-2-2b-it") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use google/gemma-2-2b-it with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "google/gemma-2-2b-it" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/gemma-2-2b-it", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/google/gemma-2-2b-it
- SGLang
How to use google/gemma-2-2b-it with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "google/gemma-2-2b-it" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/gemma-2-2b-it", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "google/gemma-2-2b-it" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/gemma-2-2b-it", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use google/gemma-2-2b-it with Docker Model Runner:
docker model run hf.co/google/gemma-2-2b-it
[solved] error downloading model!
On huggingface I read: Gated model
You have been granted access to this model
And I can download it on huggingface website.
But if I use huggingface-cli (using the same account)
I get:
Fetching 12 files: 0% 0/12 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_errors.py", line 304, in hf_raise_for_status
response.raise_for_status()
File "/usr/local/lib/python3.10/dist-packages/requests/models.py", line 1021, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/google/gemma-2b-it/resolve/4cf79afa15bef73c0b98ff5937d8e57d6071ef71/.gitattributes
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/bin/huggingface-cli", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/commands/huggingface_cli.py", line 51, in main
service.run()
File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/commands/download.py", line 146, in run
print(self._download()) # Print path to downloaded files
File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/commands/download.py", line 180, in _download
return snapshot_download(
File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
return fn(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/_snapshot_download.py", line 294, in snapshot_download
thread_map(
File "/usr/local/lib/python3.10/dist-packages/tqdm/contrib/concurrent.py", line 69, in thread_map
return _executor_map(ThreadPoolExecutor, fn, *iterables, **tqdm_kwargs)
File "/usr/local/lib/python3.10/dist-packages/tqdm/contrib/concurrent.py", line 51, in _executor_map
return list(tqdm_class(ex.map(fn, *iterables, chunksize=chunksize), **kwargs))
File "/usr/local/lib/python3.10/dist-packages/tqdm/std.py", line 1181, in __iter__
for obj in iterable:
File "/usr/lib/python3.10/concurrent/futures/_base.py", line 621, in result_iterator
yield _result_or_cancel(fs.pop())
File "/usr/lib/python3.10/concurrent/futures/_base.py", line 319, in _result_or_cancel
return fut.result(timeout)
File "/usr/lib/python3.10/concurrent/futures/_base.py", line 458, in result
return self.__get_result()
File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/_snapshot_download.py", line 268, in _inner_hf_hub_download
return hf_hub_download(
File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
return fn(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py", line 1202, in hf_hub_download
return _hf_hub_download_to_local_dir(
File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py", line 1440, in _hf_hub_download_to_local_dir
_raise_on_head_call_error(head_call_error, force_download, local_files_only)
File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py", line 1823, in _raise_on_head_call_error
raise head_call_error
File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py", line 1722, in _get_metadata_or_catch_error
metadata = get_hf_file_metadata(url=url, proxies=proxies, timeout=etag_timeout, headers=headers)
File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
return fn(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py", line 1645, in get_hf_file_metadata
r = _request_wrapper(
File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py", line 372, in _request_wrapper
response = _request_wrapper(
File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py", line 396, in _request_wrapper
hf_raise_for_status(response)
File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_errors.py", line 321, in hf_raise_for_status
raise GatedRepoError(message, response) from e
huggingface_hub.utils._errors.GatedRepoError: 401 Client Error. (Request ID: Root=1-66aaa8f7-66fb512034a03a9b63412dae;745fcb6f-2cf7-4e61-a5c3-4929c5167f21)
Cannot access gated repo for url https://huggingface.co/google/gemma-2b-it/resolve/4cf79afa15bef73c0b98ff5937d8e57d6071ef71/.gitattributes.
Access to model google/gemma-2b-it is restricted. You must be authenticated to access it.
Hi there! Can you confirm that you have logged in with an access token of yours?
huggingface-cli login
I just tried again, even created 2 new access tokens. Same as before. It seems a problem with huggingface-cli or something else (on gated repos)
I just tested with gemma-7b (which worked before) and I have the same output.
Hmm same output with mistral too...
it seems a huggingface problem!
I am unable to reproduce, meaning I believe this is an issue with your environment. Can you try with Google colab (or a python venv)?
Also, are you trying to download google/gemma-2-2b-it or google/gemma-2b-it?
works fine for me
guys how to use this in terms like ChatGPT? do i need to find a chat code and use this model as an API ? Also can it remember previous conversation?
You need an API framework, e.g. ollama or vLLM.
My bad. An almost invisible typo in a colab notebook caused all this.
Sorry, no idea about it. Please find it out by yourself.
As I said. it was my fault. A typo in my code.
That's why I closed this discussion.
Sorry for the confusion.