Instructions to use google/gemma-3-1b-it with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use google/gemma-3-1b-it with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="google/gemma-3-1b-it")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("google/gemma-3-1b-it")
model = AutoModelForCausalLM.from_pretrained("google/gemma-3-1b-it")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use google/gemma-3-1b-it with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "google/gemma-3-1b-it"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "google/gemma-3-1b-it",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/google/gemma-3-1b-it

SGLang

How to use google/gemma-3-1b-it with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "google/gemma-3-1b-it" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "google/gemma-3-1b-it",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "google/gemma-3-1b-it" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "google/gemma-3-1b-it",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use google/gemma-3-1b-it with Docker Model Runner:
```
docker model run hf.co/google/gemma-3-1b-it
```

OSError: ./gemma-3-1b-it does not appear to have a file named preprocessor_config.json.

#21

by zjnyly - opened May 8, 2025

Discussion

zjnyly

May 8, 2025

Hi, it seems that the ·preprocessor_config.json· file is missing. I’ve never seen this file before. I encountered this problem when I tried to use llm-compressor to quantize the model.

Traceback (most recent call last):
  File "/home/zjnyly/LLMs/llm-compressor.py", line 87, in <module>
    oneshot(
  File "/home/zjnyly/miniconda3/envs/py310_new/lib/python3.10/site-packages/compressed_tensors/utils/helpers.py", line 190, in wrapped
    return func(*args, **kwargs)
  File "/home/zjnyly/miniconda3/envs/py310_new/lib/python3.10/site-packages/llmcompressor/transformers/finetune/text_generation.py", line 33, in oneshot
    oneshot(**kwargs)
  File "/home/zjnyly/miniconda3/envs/py310_new/lib/python3.10/site-packages/llmcompressor/entrypoints/oneshot.py", line 178, in oneshot
    one_shot = Oneshot(**kwargs)
  File "/home/zjnyly/miniconda3/envs/py310_new/lib/python3.10/site-packages/llmcompressor/entrypoints/oneshot.py", line 110, in __init__
    pre_process(model_args)
  File "/home/zjnyly/miniconda3/envs/py310_new/lib/python3.10/site-packages/llmcompressor/entrypoints/utils.py", line 58, in pre_process
    model_args.processor = initialize_processor_from_path(
  File "/home/zjnyly/miniconda3/envs/py310_new/lib/python3.10/site-packages/llmcompressor/entrypoints/utils.py", line 240, in initialize_processor_from_path
    processor = AutoProcessor.from_pretrained(
  File "/home/zjnyly/miniconda3/envs/py310_new/lib/python3.10/site-packages/transformers/models/auto/processing_auto.py", line 347, in from_pretrained
    return processor_class.from_pretrained(
  File "/home/zjnyly/miniconda3/envs/py310_new/lib/python3.10/site-packages/transformers/processing_utils.py", line 1079, in from_pretrained
    args = cls._get_arguments_from_pretrained(pretrained_model_name_or_path, **kwargs)
  File "/home/zjnyly/miniconda3/envs/py310_new/lib/python3.10/site-packages/transformers/processing_utils.py", line 1143, in _get_arguments_from_pretrained
    args.append(attribute_class.from_pretrained(pretrained_model_name_or_path, **kwargs))
  File "/home/zjnyly/miniconda3/envs/py310_new/lib/python3.10/site-packages/transformers/models/auto/image_processing_auto.py", line 467, in from_pretrained
    raise initial_exception
  File "/home/zjnyly/miniconda3/envs/py310_new/lib/python3.10/site-packages/transformers/models/auto/image_processing_auto.py", line 449, in from_pretrained
    config_dict, _ = ImageProcessingMixin.get_image_processor_dict(
  File "/home/zjnyly/miniconda3/envs/py310_new/lib/python3.10/site-packages/transformers/image_processing_base.py", line 340, in get_image_processor_dict
    resolved_image_processor_file = cached_file(
  File "/home/zjnyly/miniconda3/envs/py310_new/lib/python3.10/site-packages/transformers/utils/hub.py", line 266, in cached_file
    file = cached_files(path_or_repo_id=path_or_repo_id, filenames=[filename], **kwargs)
  File "/home/zjnyly/miniconda3/envs/py310_new/lib/python3.10/site-packages/transformers/utils/hub.py", line 381, in cached_files
    raise OSError(
OSError: ./gemma-3-1b-it does not appear to have a file named preprocessor_config.json. Checkout 'https://huggingface.co/./gemma-3-1b-it/tree/main' for available files.

BalakrishnaCh

Google org May 13, 2025

Hi @zjnyly ,

I have reproduced the issue in colab, the above error occurred due to the quantization process is trying to access the preprocessor_config.json file to get the tokenizer key while calling the oneshot function from llm-compressor. However the google/gemma-3-1b-it model doesn't contain any such config file. You can pass the tokenizer parameter to the oneshot function while doing the quantization process.

Please find the following gist file for your reference.
Thanks.

zjnyly

May 14, 2025

Hi @zjnyly ,

I have reproduced the issue in colab, the above error occurred due to the quantization process is trying to access the preprocessor_config.json file to get the tokenizer key while calling the oneshot function from llm-compressor. However the google/gemma-3-1b-it model doesn't contain any such config file. You can pass the tokenizer parameter to the oneshot function while doing the quantization process.

Please find the following gist file for your reference.
Thanks.

Thanks for your help!

zjnyly changed discussion status to closed May 14, 2025

sappho192

Oct 13, 2025

For someone who have the same issue when using SFTTrainer, try adding parameter like processing_class=tokenizer to SFTTrainer, not tokenizer=tokenizer.

https://stackoverflow.com/a/79546966

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment