Instructions to use kontextdev/agent-gemma with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use kontextdev/agent-gemma with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="kontextdev/agent-gemma")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("kontextdev/agent-gemma")
model = AutoModelForMultimodalLM.from_pretrained("kontextdev/agent-gemma")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use kontextdev/agent-gemma with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "kontextdev/agent-gemma"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "kontextdev/agent-gemma",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/kontextdev/agent-gemma

SGLang

How to use kontextdev/agent-gemma with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "kontextdev/agent-gemma" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "kontextdev/agent-gemma",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "kontextdev/agent-gemma" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "kontextdev/agent-gemma",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use kontextdev/agent-gemma with Docker Model Runner:
```
docker model run hf.co/kontextdev/agent-gemma
```

Agent Gemma — Gemma 3n E2B Fine-Tuned for Function Calling

A fine-tuned version of google/gemma-3n-E2B-it trained for on-device function calling using Google's FunctionGemma technique.

What's Different from Stock Gemma 3n

Fixed: `format_function_declaration` Template Error

The stock Gemma 3n chat template uses format_function_declaration() — a custom Jinja2 function available in Google's Python tokenizer but not supported by LiteRT-LM's on-device template engine. This causes:

Failed to apply template: unknown function: format_function_declaration is unknown (in template:21)

This model replaces the stock template with a LiteRT-LM compatible template that uses only standard Jinja2 features (tojson filter, <start_function_declaration> / <end_function_declaration> markers). The template is embedded in both tokenizer_config.json and chat_template.jinja.

Function Calling Format

The model uses the FunctionGemma markup format:

<start_function_call>call:function_name{param:<escape>value<escape>}<end_function_call>

Tool declarations are formatted as:

<start_function_declaration>{"name": "get_weather", "parameters": {...}}<end_function_declaration>

Training Details

Base model: google/gemma-3n-E2B-it (5.4B parameters)
Method: QLoRA (rank=16, alpha=32) — 22.9M trainable parameters (0.42%)
Dataset: google/mobile-actions (8,693 training samples)
Training: 500 steps, batch_size=1, max_seq_length=512, learning_rate=2e-4
Precision: bfloat16

Usage

With LiteRT-LM on Android (Kotlin)

// After converting to .litertlm format
val engine = Engine(EngineConfig(modelPath = "agent-gemma.litertlm"))
engine.initialize()

val conversation = engine.createConversation(
    ConversationConfig(
        systemMessage = Message.of("You are a helpful assistant."),
        tools = listOf(MyToolSet())  // @Tool annotated class
    )
)

// No format_function_declaration error!
conversation.sendMessageAsync(Message.of("What's the weather?"))
    .collect { print(it) }

With Transformers (Python)

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("kontextdev/agent-gemma")
tokenizer = AutoTokenizer.from_pretrained("kontextdev/agent-gemma")

messages = [
    {"role": "developer", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What's the weather in Tokyo?"}
]

tools = [{"function": {"name": "get_weather", "parameters": {"type": "object", "properties": {"location": {"type": "string"}}}}}]

text = tokenizer.apply_chat_template(messages, tools=tools, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt")
output = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(output[0]))

Chat Template

The custom chat template (in tokenizer_config.json and chat_template.jinja) supports these roles:

developer / system — system instructions + tool declarations
user — user messages
model / assistant — model responses, including tool_calls
tool — tool execution results

Converting to .litertlm

Use the LiteRT-LM conversion tools to package for on-device deployment:

# The chat_template.jinja is included in this repo
python scripts/convert-to-litertlm.py \
  --model_dir kontextdev/agent-gemma \
  --output agent-gemma.litertlm

Files

model-*.safetensors — Merged model weights (bfloat16)
tokenizer_config.json — Tokenizer config with embedded chat template
chat_template.jinja — Standalone chat template file
config.json — Model architecture config
checkpoint-* — Training checkpoints (LoRA)

License

This model inherits the Gemma license from the base model.

Downloads last month: 3

Safetensors

Model size

5B params

Tensor type

BF16

Model tree for kontextdev/agent-gemma

Base model

google/gemma-3n-E4B

Finetuned

google/gemma-3n-E4B-it

Finetuned

google/gemma-3n-E2B-it

Finetuned

(42)

this model

Agent Gemma — Gemma 3n E2B Fine-Tuned for Function Calling

What's Different from Stock Gemma 3n

Fixed: format_function_declaration Template Error

Function Calling Format

Training Details

Usage

With LiteRT-LM on Android (Kotlin)

With Transformers (Python)

Chat Template

Converting to .litertlm

Files

License

Model tree for kontextdev/agent-gemma

Fixed: `format_function_declaration` Template Error