Instructions to use google/gemma-3-1b-it with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use google/gemma-3-1b-it with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="google/gemma-3-1b-it") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("google/gemma-3-1b-it") model = AutoModelForCausalLM.from_pretrained("google/gemma-3-1b-it") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use google/gemma-3-1b-it with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "google/gemma-3-1b-it" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/gemma-3-1b-it", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/google/gemma-3-1b-it
- SGLang
How to use google/gemma-3-1b-it with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "google/gemma-3-1b-it" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/gemma-3-1b-it", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "google/gemma-3-1b-it" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/gemma-3-1b-it", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use google/gemma-3-1b-it with Docker Model Runner:
docker model run hf.co/google/gemma-3-1b-it
OSError: ./gemma-3-1b-it does not appear to have a file named preprocessor_config.json.
Hi, it seems that the ·preprocessor_config.json· file is missing. I’ve never seen this file before. I encountered this problem when I tried to use llm-compressor to quantize the model.
Traceback (most recent call last):
File "/home/zjnyly/LLMs/llm-compressor.py", line 87, in <module>
oneshot(
File "/home/zjnyly/miniconda3/envs/py310_new/lib/python3.10/site-packages/compressed_tensors/utils/helpers.py", line 190, in wrapped
return func(*args, **kwargs)
File "/home/zjnyly/miniconda3/envs/py310_new/lib/python3.10/site-packages/llmcompressor/transformers/finetune/text_generation.py", line 33, in oneshot
oneshot(**kwargs)
File "/home/zjnyly/miniconda3/envs/py310_new/lib/python3.10/site-packages/llmcompressor/entrypoints/oneshot.py", line 178, in oneshot
one_shot = Oneshot(**kwargs)
File "/home/zjnyly/miniconda3/envs/py310_new/lib/python3.10/site-packages/llmcompressor/entrypoints/oneshot.py", line 110, in __init__
pre_process(model_args)
File "/home/zjnyly/miniconda3/envs/py310_new/lib/python3.10/site-packages/llmcompressor/entrypoints/utils.py", line 58, in pre_process
model_args.processor = initialize_processor_from_path(
File "/home/zjnyly/miniconda3/envs/py310_new/lib/python3.10/site-packages/llmcompressor/entrypoints/utils.py", line 240, in initialize_processor_from_path
processor = AutoProcessor.from_pretrained(
File "/home/zjnyly/miniconda3/envs/py310_new/lib/python3.10/site-packages/transformers/models/auto/processing_auto.py", line 347, in from_pretrained
return processor_class.from_pretrained(
File "/home/zjnyly/miniconda3/envs/py310_new/lib/python3.10/site-packages/transformers/processing_utils.py", line 1079, in from_pretrained
args = cls._get_arguments_from_pretrained(pretrained_model_name_or_path, **kwargs)
File "/home/zjnyly/miniconda3/envs/py310_new/lib/python3.10/site-packages/transformers/processing_utils.py", line 1143, in _get_arguments_from_pretrained
args.append(attribute_class.from_pretrained(pretrained_model_name_or_path, **kwargs))
File "/home/zjnyly/miniconda3/envs/py310_new/lib/python3.10/site-packages/transformers/models/auto/image_processing_auto.py", line 467, in from_pretrained
raise initial_exception
File "/home/zjnyly/miniconda3/envs/py310_new/lib/python3.10/site-packages/transformers/models/auto/image_processing_auto.py", line 449, in from_pretrained
config_dict, _ = ImageProcessingMixin.get_image_processor_dict(
File "/home/zjnyly/miniconda3/envs/py310_new/lib/python3.10/site-packages/transformers/image_processing_base.py", line 340, in get_image_processor_dict
resolved_image_processor_file = cached_file(
File "/home/zjnyly/miniconda3/envs/py310_new/lib/python3.10/site-packages/transformers/utils/hub.py", line 266, in cached_file
file = cached_files(path_or_repo_id=path_or_repo_id, filenames=[filename], **kwargs)
File "/home/zjnyly/miniconda3/envs/py310_new/lib/python3.10/site-packages/transformers/utils/hub.py", line 381, in cached_files
raise OSError(
OSError: ./gemma-3-1b-it does not appear to have a file named preprocessor_config.json. Checkout 'https://huggingface.co/./gemma-3-1b-it/tree/main' for available files.
Hi @zjnyly ,
I have reproduced the issue in colab, the above error occurred due to the quantization process is trying to access the preprocessor_config.json file to get the tokenizer key while calling the oneshot function from llm-compressor. However the google/gemma-3-1b-it model doesn't contain any such config file. You can pass the tokenizer parameter to the oneshot function while doing the quantization process.
Please find the following gist file for your reference.
Thanks.
Hi @zjnyly ,
I have reproduced the issue in colab, the above error occurred due to the quantization process is trying to access the preprocessor_config.json file to get the tokenizer key while calling the
oneshotfunction from llm-compressor. However the google/gemma-3-1b-it model doesn't contain any such config file. You can pass the tokenizer parameter to theoneshotfunction while doing the quantization process.Please find the following gist file for your reference.
Thanks.
Thanks for your help!
For someone who have the same issue when using SFTTrainer, try adding parameter like processing_class=tokenizer to SFTTrainer, not tokenizer=tokenizer.