Instructions to use google/gemma-2-9b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use google/gemma-2-9b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="google/gemma-2-9b")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-9b") model = AutoModelForCausalLM.from_pretrained("google/gemma-2-9b") - Inference
- Local Apps Settings
- vLLM
How to use google/gemma-2-9b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "google/gemma-2-9b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/gemma-2-9b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/google/gemma-2-9b
- SGLang
How to use google/gemma-2-9b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "google/gemma-2-9b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/gemma-2-9b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "google/gemma-2-9b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/gemma-2-9b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use google/gemma-2-9b with Docker Model Runner:
docker model run hf.co/google/gemma-2-9b
"Cannot access gated repo" after being granted access to this model
#37
by yangzhangmc - opened
I can see Gated model - You have been granted access to this model on this model page. But seeing the following error:
>>> import torch
>>> from transformers import pipeline
>>>
>>> pipe = pipeline(
... "text-generation",
... model="google/gemma-2-9b",
... device="cuda", # replace with "mps" to run on a Mac device
... )
Traceback (most recent call last):
File "/root/qlora_ft_model_hpo/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 304, in hf_raise_for_status
response.raise_for_status()
File "/root/qlora_ft_model_hpo/lib/python3.10/site-packages/requests/models.py", line 1024, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://huggingface.co/google/gemma-2-9b/resolve/main/config.json
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/root/qlora_ft_model_hpo/lib/python3.10/site-packages/transformers/utils/hub.py", line 402, in cached_file
resolved_file = hf_hub_download(
File "/root/qlora_ft_model_hpo/lib/python3.10/site-packages/huggingface_hub/utils/_deprecation.py", line 101, in inner_f
return f(*args, **kwargs)
File "/root/qlora_ft_model_hpo/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
return fn(*args, **kwargs)
File "/root/qlora_ft_model_hpo/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1240, in hf_hub_download
return _hf_hub_download_to_cache_dir(
File "/root/qlora_ft_model_hpo/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1347, in _hf_hub_download_to_cache_dir
_raise_on_head_call_error(head_call_error, force_download, local_files_only)
File "/root/qlora_ft_model_hpo/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1854, in _raise_on_head_call_error
raise head_call_error
File "/root/qlora_ft_model_hpo/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1751, in _get_metadata_or_catch_error
metadata = get_hf_file_metadata(
File "/root/qlora_ft_model_hpo/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
return fn(*args, **kwargs)
File "/root/qlora_ft_model_hpo/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1673, in get_hf_file_metadata
r = _request_wrapper(
File "/root/qlora_ft_model_hpo/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 376, in _request_wrapper
response = _request_wrapper(
File "/root/qlora_ft_model_hpo/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 400, in _request_wrapper
hf_raise_for_status(response)
File "/root/qlora_ft_model_hpo/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 321, in hf_raise_for_status
raise GatedRepoError(message, response) from e
huggingface_hub.utils._errors.GatedRepoError: 403 Client Error. (Request ID: Root=1-66c0bda2-7faaa32c5f0ca5312e24a076;795882a8-4727-4740-8003-dac22bb33c63)
Cannot access gated repo for url https://huggingface.co/google/gemma-2-9b/resolve/main/config.json.
Access to model google/gemma-2-9b is restricted and you are not in the authorized list. Visit https://huggingface.co/google/gemma-2-9b to ask for access.
I am able to access other huggingface models (e.g., meta-llama/Meta-Llama-3.1-8B) using the same huggingface account.
Hi @yangzhangmc ,
Could you please re-check the access token you have assigned and ensure that you are using the access token for the gemma-2-9b model.
Thank you.
yangzhangmc changed discussion status to closed