How to use from
vLLM
Install from pip and serve model
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "kontextdev/agent-gemma"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "kontextdev/agent-gemma",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'
Use Docker
docker model run hf.co/kontextdev/agent-gemma
Quick Links

Agent Gemma โ€” Gemma 3n E2B Fine-Tuned for Function Calling

A fine-tuned version of google/gemma-3n-E2B-it trained for on-device function calling using Google's FunctionGemma technique.

What's Different from Stock Gemma 3n

Fixed: format_function_declaration Template Error

The stock Gemma 3n chat template uses format_function_declaration() โ€” a custom Jinja2 function available in Google's Python tokenizer but not supported by LiteRT-LM's on-device template engine. This causes:

Failed to apply template: unknown function: format_function_declaration is unknown (in template:21)

This model replaces the stock template with a LiteRT-LM compatible template that uses only standard Jinja2 features (tojson filter, <start_function_declaration> / <end_function_declaration> markers). The template is embedded in both tokenizer_config.json and chat_template.jinja.

Function Calling Format

The model uses the FunctionGemma markup format:

<start_function_call>call:function_name{param:<escape>value<escape>}<end_function_call>

Tool declarations are formatted as:

<start_function_declaration>{"name": "get_weather", "parameters": {...}}<end_function_declaration>

Training Details

  • Base model: google/gemma-3n-E2B-it (5.4B parameters)
  • Method: QLoRA (rank=16, alpha=32) โ€” 22.9M trainable parameters (0.42%)
  • Dataset: google/mobile-actions (8,693 training samples)
  • Training: 500 steps, batch_size=1, max_seq_length=512, learning_rate=2e-4
  • Precision: bfloat16

Usage

With LiteRT-LM on Android (Kotlin)

// After converting to .litertlm format
val engine = Engine(EngineConfig(modelPath = "agent-gemma.litertlm"))
engine.initialize()

val conversation = engine.createConversation(
    ConversationConfig(
        systemMessage = Message.of("You are a helpful assistant."),
        tools = listOf(MyToolSet())  // @Tool annotated class
    )
)

// No format_function_declaration error!
conversation.sendMessageAsync(Message.of("What's the weather?"))
    .collect { print(it) }

With Transformers (Python)

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("kontextdev/agent-gemma")
tokenizer = AutoTokenizer.from_pretrained("kontextdev/agent-gemma")

messages = [
    {"role": "developer", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What's the weather in Tokyo?"}
]

tools = [{"function": {"name": "get_weather", "parameters": {"type": "object", "properties": {"location": {"type": "string"}}}}}]

text = tokenizer.apply_chat_template(messages, tools=tools, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt")
output = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(output[0]))

Chat Template

The custom chat template (in tokenizer_config.json and chat_template.jinja) supports these roles:

  • developer / system โ€” system instructions + tool declarations
  • user โ€” user messages
  • model / assistant โ€” model responses, including tool_calls
  • tool โ€” tool execution results

Converting to .litertlm

Use the LiteRT-LM conversion tools to package for on-device deployment:

# The chat_template.jinja is included in this repo
python scripts/convert-to-litertlm.py \
  --model_dir kontextdev/agent-gemma \
  --output agent-gemma.litertlm

Files

  • model-*.safetensors โ€” Merged model weights (bfloat16)
  • tokenizer_config.json โ€” Tokenizer config with embedded chat template
  • chat_template.jinja โ€” Standalone chat template file
  • config.json โ€” Model architecture config
  • checkpoint-* โ€” Training checkpoints (LoRA)

License

This model inherits the Gemma license from the base model.

Downloads last month
4
Safetensors
Model size
5B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for kontextdev/agent-gemma

Finetuned
(41)
this model