Instructions to use google/gemma-3-270m-it with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use google/gemma-3-270m-it with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="google/gemma-3-270m-it")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("google/gemma-3-270m-it", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use google/gemma-3-270m-it with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "google/gemma-3-270m-it" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/gemma-3-270m-it", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/google/gemma-3-270m-it
- SGLang
How to use google/gemma-3-270m-it with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "google/gemma-3-270m-it" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/gemma-3-270m-it", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "google/gemma-3-270m-it" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/gemma-3-270m-it", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use google/gemma-3-270m-it with Docker Model Runner:
docker model run hf.co/google/gemma-3-270m-it
Towards fine-tune perfection
For experimental purposes, I would like to fine-tune the gemma3-270-it model. I believe the model is a good basis for performing tasks in Hungarian. The base model struggles with the Hungarian language, but after relatively short fine-tuning, it learns the language well. For now, I am training it to perform extractive question-answering tasks (quasi-RAG) from a given context using the LoRA method. My question is whether there was a special system prompt for this task during the instruction-following training of the model? Also, is the instruct dataset itself available anywhere? (This would be very useful for avoiding catastrophic forgetting and for alignment purposes.)
Thanks.
Hey @GaborMadarasz ,
It is great to see you adapting our Gemma-3-270m-it model for Hungarian extractive QA via LORA. To answer your first question: There was no special or hidden system prompt used during instruction tuning. As described in our official Gemma formatting and system instructions documentation, the architecture supports two roles: user and model. Any high-level instructions are included within the user turn. For fine-tuning, you can try to mirror this format by placing task instructions and context directly inside the <start_of_turn>user ... <end_of_turn> block.
Regarding your second question, the specific instruction-tuning dataset used for Gemma 3 family is not publicly released. If you are concerned about catastrophic forgetting, a common approach is to mix your task-specific data with general instruction-following data to help preserve alignment and instruction adherence during fine-tuning.
Thank you!
Thank you for your reply, @srikanta-221
I will train the models based on this, and I hope that the Hungarian adaptations of the gemma3-270 variants will arrive soon.