|
|
--- |
|
|
base_model: google/functiongemma-270m-it |
|
|
tags: |
|
|
- function-calling |
|
|
- mobile-actions |
|
|
- gemma |
|
|
library_name: transformers |
|
|
datasets: |
|
|
- google/mobile-actions |
|
|
language: |
|
|
- en |
|
|
license: gemma |
|
|
--- |
|
|
|
|
|
# FunctionGemma 270M for Mobile Actions |
|
|
|
|
|
This model is a fine-tuned version of [google/functiongemma-270m-it](https://huggingface.co/google/functiongemma-270m-it) specialized for mobile assistant actions. It has been trained on the [google/mobile-actions](https://huggingface.co/datasets/google/mobile-actions) dataset to perform structured function calling for common mobile device tasks. |
|
|
|
|
|
## Model Description |
|
|
|
|
|
**Base Model**: `google/functiongemma-270m-it` - A 270M parameter instruction-tuned model from Google's FunctionGemma family, designed for function calling tasks. |
|
|
|
|
|
**Specialization**: Mobile assistant actions including: |
|
|
- Calendar event management |
|
|
- Email composition and sending |
|
|
- Contact creation |
|
|
- Flashlight control |
|
|
- Wi-Fi settings navigation |
|
|
- Map location display |
|
|
|
|
|
**Training Objective**: The model learns to emit structured function calls in the format `call:<function_name>{arg1:value1,arg2:value2,...}` instead of natural language responses. |
|
|
|
|
|
## Supported Functions |
|
|
|
|
|
The model is optimized to call these mobile action functions: |
|
|
|
|
|
1. **`turn_on_flashlight()`** - Turns the device flashlight on |
|
|
2. **`turn_off_flashlight()`** - Turns the device flashlight off |
|
|
3. **`create_contact(first_name, last_name, phone_number?, email?)`** - Creates a new contact |
|
|
4. **`send_email(to, subject, body?)`** - Sends an email to a recipient |
|
|
5. **`show_map(query)`** - Displays a location on the map by name, business, or address |
|
|
6. **`open_wifi_settings()`** - Opens the Wi-Fi settings screen |
|
|
7. **`create_calendar_event(title, datetime)`** - Creates a calendar event (datetime in ISO format: `YYYY-MM-DDTHH:MM:SS`) |
|
|
|
|
|
## Training Details |
|
|
|
|
|
### Training Data |
|
|
|
|
|
- **Dataset**: [google/mobile-actions](https://huggingface.co/datasets/google/mobile-actions) |
|
|
- **Format**: JSONL with prompt-completion pairs |
|
|
- **Splits**: |
|
|
- Training set: examples with `"metadata": "train"` |
|
|
- Evaluation set: examples with `"metadata": "eval"` |
|
|
- **Preprocessing**: Converted to TRL prompt-completion format with `completion_only_loss=True` |
|
|
|
|
|
### Training Procedure |
|
|
|
|
|
Fine-tuned using Hugging Face [TRL (Transformer Reinforcement Learning)](https://huggingface.co/docs/trl) with the `SFTTrainer`. |
|
|
|
|
|
**Training Configuration**: |
|
|
- **Epochs**: 4 |
|
|
- **Batch size**: 8 per device |
|
|
- **Gradient accumulation steps**: 4 |
|
|
- **Learning rate**: 5e-5 |
|
|
- **Scheduler**: Cosine |
|
|
- **Max sequence length**: 997 tokens (based on longest example: 897 tokens) |
|
|
- **Optimizer**: AdamW (fused) |
|
|
- **Precision**: bfloat16 |
|
|
- **Gradient checkpointing**: Enabled |
|
|
- **Completion only loss**: True (trains only on model outputs, not prompts) |
|
|
|
|
|
**Training Infrastructure**: |
|
|
- **Hardware**: Google Colab A100 GPU |
|
|
- **Training time**: ~~60 minutes for 4 epochs |
|
|
- **Library versions**: transformers==4.57.1, trl==0.25.1, datasets==4.4.1 |
|
|
|
|
|
### Training Results |
|
|
|
|
|
Final metrics after 4 epochs: |
|
|
|
|
|
| Step | Training Loss | Validation Loss | Mean Token Accuracy | |
|
|
|------|---------------|-----------------|---------------------| |
|
|
| 500 | 0.008800 | 0.013452 | 0.996691 | |
|
|
|
|
|
The model achieved 99.67% token-level accuracy on the validation set, showing significant improvement over the base model's mobile action capabilities. |
|
|
|
|
|
## Intended Use |
|
|
|
|
|
This model is designed for: |
|
|
- **Mobile AI assistants** that need to execute device actions based on user requests |
|
|
- **Voice-controlled mobile applications** |
|
|
- **Conversational agents** that interact with mobile device features |
|
|
- **On-device AI** applications (can be converted to `.litertlm` format for deployment) |
|
|
|
|
|
## How to Use |
|
|
|
|
|
### Basic Inference |
|
|
|
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline |
|
|
import json |
|
|
|
|
|
# Load model and tokenizer |
|
|
model_id = "jprtr/google_mobile_actions" |
|
|
tokenizer = AutoTokenizer.from_pretrained(model_id) |
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
|
model_id, |
|
|
device_map="auto", |
|
|
attn_implementation="eager", |
|
|
torch_dtype="auto", |
|
|
) |
|
|
|
|
|
# Create pipeline |
|
|
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer) |
|
|
|
|
|
# Define the tools (function schemas) |
|
|
tools = [ |
|
|
{ |
|
|
"function": { |
|
|
"name": "create_calendar_event", |
|
|
"description": "Creates a new calendar event.", |
|
|
"parameters": { |
|
|
"type": "OBJECT", |
|
|
"properties": { |
|
|
"title": {"type": "STRING", "description": "The title of the event."}, |
|
|
"datetime": {"type": "STRING", "description": "The date and time in YYYY-MM-DDTHH:MM:SS format."}, |
|
|
}, |
|
|
"required": ["title", "datetime"], |
|
|
}, |
|
|
} |
|
|
}, |
|
|
{ |
|
|
"function": { |
|
|
"name": "send_email", |
|
|
"description": "Sends an email.", |
|
|
"parameters": { |
|
|
"type": "OBJECT", |
|
|
"properties": { |
|
|
"to": {"type": "STRING", "description": "The recipient email address."}, |
|
|
"subject": {"type": "STRING", "description": "The email subject."}, |
|
|
"body": {"type": "STRING", "description": "The email body."}, |
|
|
}, |
|
|
"required": ["to", "subject"], |
|
|
}, |
|
|
} |
|
|
}, |
|
|
# ... add other function definitions |
|
|
] |
|
|
|
|
|
# Create messages |
|
|
messages = [ |
|
|
{ |
|
|
"role": "developer", |
|
|
"content": ( |
|
|
"Current date and time given in YYYY-MM-DDTHH:MM:SS format: 2025-07-10T19:06:29\n" |
|
|
"Day of week is Thursday\n" |
|
|
"You are a model that can do function calling with the following functions\n" |
|
|
), |
|
|
}, |
|
|
{ |
|
|
"role": "user", |
|
|
"content": 'Schedule a "team meeting" tomorrow at 4pm.', |
|
|
}, |
|
|
] |
|
|
|
|
|
# Apply chat template |
|
|
prompt = tokenizer.apply_chat_template( |
|
|
messages, |
|
|
tools=tools, |
|
|
tokenize=False, |
|
|
add_generation_prompt=True, |
|
|
) |
|
|
|
|
|
# Generate |
|
|
output = pipe(prompt, max_new_tokens=200)[0]["generated_text"][len(prompt):].strip() |
|
|
print("Model output:", output) |
|
|
# Example output: call:create_calendar_event{datetime:2025-07-11T16:00:00,title:team meeting} |
|
|
``` |
|
|
|
|
|
### Parsing Function Calls |
|
|
|
|
|
The model outputs function calls in a simple format: |
|
|
``` |
|
|
call:<function_name>{arg1:value1,arg2:value2,...} |
|
|
``` |
|
|
|
|
|
For multiple function calls, they appear sequentially: |
|
|
``` |
|
|
call:create_calendar_event{datetime:2025-07-15T10:30:00,title:Dental Checkup} |
|
|
call:send_email{to:user@example.com,subject:Appointment,body:See you there!} |
|
|
``` |
|
|
|
|
|
You can parse these by: |
|
|
1. Splitting on `call:` to identify individual function calls |
|
|
2. Extracting the function name (text before `{`) |
|
|
3. Parsing the arguments block (content within `{}`) |
|
|
|
|
|
## Evaluation |
|
|
|
|
|
The model was evaluated on the held-out test set from the mobile-actions dataset. Evaluation metrics compare exact string matching of the model's function call outputs against ground truth labels. |
|
|
|
|
|
**Key Observations**: |
|
|
- The base FunctionGemma 270M model often fails to call appropriate functions for mobile actions |
|
|
- After fine-tuning, the model reliably generates correct function calls with proper argument formatting |
|
|
- Token-level accuracy on the validation set: **99.67%** |
|
|
|
|
|
## Limitations |
|
|
|
|
|
- The model is specialized for the 7 mobile action functions listed above and may not generalize well to other function calling tasks |
|
|
- Date/time parsing relies on context provided in the developer message (current date/time must be specified) |
|
|
- The model outputs may occasionally include variations in argument formatting that are semantically correct but don't exactly match the expected format |
|
|
- This is a 270M parameter model, so while efficient for mobile deployment, it may have lower accuracy than larger models |
|
|
|
|
|
## On-Device Deployment |
|
|
|
|
|
The model can be converted to `.litertlm` format for on-device deployment using `ai-edge-torch`. See the [training notebook](https://colab.research.google.com/github/google-gemini/gemma-cookbook/blob/main/FunctionGemma/%5BFunctionGemma%5DFinetune_FunctionGemma_270M_for_Mobile_Actions_with_Hugging_Face.ipynb) for conversion instructions. |
|
|
|
|
|
The converted model can be deployed on: |
|
|
- Android devices via [Google AI Edge](https://ai.google.dev/edge) |
|
|
- [AI Edge Gallery app](https://play.google.com/store/apps/details?id=com.google.ai.edge.gallery) |
|
|
|
|
|
## Training Notebook |
|
|
|
|
|
For full training details, hyperparameter tuning, and evaluation, see the original Colab notebook: |
|
|
[Finetune FunctionGemma 270M for Mobile Actions](https://colab.research.google.com/github/google-gemini/gemma-cookbook/blob/main/FunctionGemma/%5BFunctionGemma%5DFinetune_FunctionGemma_270M_for_Mobile_Actions_with_Hugging_Face.ipynb) |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model, please cite the original FunctionGemma paper and the Google Mobile Actions dataset: |
|
|
|
|
|
```bibtex |
|
|
@misc{functiongemma2024, |
|
|
title={FunctionGemma: Function Calling for Gemma Models}, |
|
|
author={Google}, |
|
|
year={2024}, |
|
|
url={https://huggingface.co/google/functiongemma-270m-it} |
|
|
} |
|
|
``` |
|
|
|
|
|
## License |
|
|
|
|
|
This model is released under the Gemma license. See the [Gemma Terms of Use](https://ai.google.dev/gemma/terms) for details. |