File size: 9,155 Bytes
b7f34d5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6fbe6b7
b7f34d5
 
 
 
6fbe6b7
b7f34d5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
---
base_model: google/functiongemma-270m-it
tags:
- function-calling
- mobile-actions
- gemma
library_name: transformers
datasets:
- google/mobile-actions
language:
- en
license: gemma
---

# FunctionGemma 270M for Mobile Actions

This model is a fine-tuned version of [google/functiongemma-270m-it](https://huggingface.co/google/functiongemma-270m-it) specialized for mobile assistant actions. It has been trained on the [google/mobile-actions](https://huggingface.co/datasets/google/mobile-actions) dataset to perform structured function calling for common mobile device tasks.

## Model Description

**Base Model**: `google/functiongemma-270m-it` - A 270M parameter instruction-tuned model from Google's FunctionGemma family, designed for function calling tasks.

**Specialization**: Mobile assistant actions including:
- Calendar event management
- Email composition and sending
- Contact creation
- Flashlight control
- Wi-Fi settings navigation  
- Map location display

**Training Objective**: The model learns to emit structured function calls in the format `call:<function_name>{arg1:value1,arg2:value2,...}` instead of natural language responses.

## Supported Functions

The model is optimized to call these mobile action functions:

1. **`turn_on_flashlight()`** - Turns the device flashlight on
2. **`turn_off_flashlight()`** - Turns the device flashlight off
3. **`create_contact(first_name, last_name, phone_number?, email?)`** - Creates a new contact
4. **`send_email(to, subject, body?)`** - Sends an email to a recipient
5. **`show_map(query)`** - Displays a location on the map by name, business, or address
6. **`open_wifi_settings()`** - Opens the Wi-Fi settings screen
7. **`create_calendar_event(title, datetime)`** - Creates a calendar event (datetime in ISO format: `YYYY-MM-DDTHH:MM:SS`)

## Training Details

### Training Data

- **Dataset**: [google/mobile-actions](https://huggingface.co/datasets/google/mobile-actions)
- **Format**: JSONL with prompt-completion pairs
- **Splits**: 
  - Training set: examples with `"metadata": "train"`
  - Evaluation set: examples with `"metadata": "eval"`
- **Preprocessing**: Converted to TRL prompt-completion format with `completion_only_loss=True`

### Training Procedure

Fine-tuned using Hugging Face [TRL (Transformer Reinforcement Learning)](https://huggingface.co/docs/trl) with the `SFTTrainer`.

**Training Configuration**:
- **Epochs**: 4
- **Batch size**: 8 per device
- **Gradient accumulation steps**: 4
- **Learning rate**: 5e-5
- **Scheduler**: Cosine
- **Max sequence length**: 997 tokens (based on longest example: 897 tokens)
- **Optimizer**: AdamW (fused)
- **Precision**: bfloat16
- **Gradient checkpointing**: Enabled
- **Completion only loss**: True (trains only on model outputs, not prompts)

**Training Infrastructure**:
- **Hardware**: Google Colab A100 GPU
- **Training time**: ~~60 minutes for 4 epochs
- **Library versions**: transformers==4.57.1, trl==0.25.1, datasets==4.4.1

### Training Results

Final metrics after 4 epochs:

| Step | Training Loss | Validation Loss | Mean Token Accuracy |
|------|---------------|-----------------|---------------------|
| 500  | 0.008800      | 0.013452        | 0.996691           |

The model achieved 99.67% token-level accuracy on the validation set, showing significant improvement over the base model's mobile action capabilities.

## Intended Use

This model is designed for:
- **Mobile AI assistants** that need to execute device actions based on user requests
- **Voice-controlled mobile applications** 
- **Conversational agents** that interact with mobile device features
- **On-device AI** applications (can be converted to `.litertlm` format for deployment)

## How to Use

### Basic Inference

```python
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
import json

# Load model and tokenizer
model_id = "jprtr/google_mobile_actions"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    attn_implementation="eager",
    torch_dtype="auto",
)

# Create pipeline
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)

# Define the tools (function schemas)
tools = [
    {
        "function": {
            "name": "create_calendar_event",
            "description": "Creates a new calendar event.",
            "parameters": {
                "type": "OBJECT",
                "properties": {
                    "title": {"type": "STRING", "description": "The title of the event."},
                    "datetime": {"type": "STRING", "description": "The date and time in YYYY-MM-DDTHH:MM:SS format."},
                },
                "required": ["title", "datetime"],
            },
        }
    },
    {
        "function": {
            "name": "send_email",
            "description": "Sends an email.",
            "parameters": {
                "type": "OBJECT",
                "properties": {
                    "to": {"type": "STRING", "description": "The recipient email address."},
                    "subject": {"type": "STRING", "description": "The email subject."},
                    "body": {"type": "STRING", "description": "The email body."},
                },
                "required": ["to", "subject"],
            },
        }
    },
    # ... add other function definitions
]

# Create messages
messages = [
    {
        "role": "developer",
        "content": (
            "Current date and time given in YYYY-MM-DDTHH:MM:SS format: 2025-07-10T19:06:29\n"
            "Day of week is Thursday\n"
            "You are a model that can do function calling with the following functions\n"
        ),
    },
    {
        "role": "user",
        "content": 'Schedule a "team meeting" tomorrow at 4pm.',
    },
]

# Apply chat template
prompt = tokenizer.apply_chat_template(
    messages,
    tools=tools,
    tokenize=False,
    add_generation_prompt=True,
)

# Generate
output = pipe(prompt, max_new_tokens=200)[0]["generated_text"][len(prompt):].strip()
print("Model output:", output)
# Example output: call:create_calendar_event{datetime:2025-07-11T16:00:00,title:team meeting}
```

### Parsing Function Calls

The model outputs function calls in a simple format:
```
call:<function_name>{arg1:value1,arg2:value2,...}
```

For multiple function calls, they appear sequentially:
```
call:create_calendar_event{datetime:2025-07-15T10:30:00,title:Dental Checkup}
call:send_email{to:user@example.com,subject:Appointment,body:See you there!}
```

You can parse these by:
1. Splitting on `call:` to identify individual function calls
2. Extracting the function name (text before `{`)
3. Parsing the arguments block (content within `{}`)

## Evaluation

The model was evaluated on the held-out test set from the mobile-actions dataset. Evaluation metrics compare exact string matching of the model's function call outputs against ground truth labels.

**Key Observations**:
- The base FunctionGemma 270M model often fails to call appropriate functions for mobile actions
- After fine-tuning, the model reliably generates correct function calls with proper argument formatting
- Token-level accuracy on the validation set: **99.67%**

## Limitations

- The model is specialized for the 7 mobile action functions listed above and may not generalize well to other function calling tasks
- Date/time parsing relies on context provided in the developer message (current date/time must be specified)
- The model outputs may occasionally include variations in argument formatting that are semantically correct but don't exactly match the expected format
- This is a 270M parameter model, so while efficient for mobile deployment, it may have lower accuracy than larger models

## On-Device Deployment

The model can be converted to `.litertlm` format for on-device deployment using `ai-edge-torch`. See the [training notebook](https://colab.research.google.com/github/google-gemini/gemma-cookbook/blob/main/FunctionGemma/%5BFunctionGemma%5DFinetune_FunctionGemma_270M_for_Mobile_Actions_with_Hugging_Face.ipynb) for conversion instructions.

The converted model can be deployed on:
- Android devices via [Google AI Edge](https://ai.google.dev/edge)
- [AI Edge Gallery app](https://play.google.com/store/apps/details?id=com.google.ai.edge.gallery)

## Training Notebook

For full training details, hyperparameter tuning, and evaluation, see the original Colab notebook:
[Finetune FunctionGemma 270M for Mobile Actions](https://colab.research.google.com/github/google-gemini/gemma-cookbook/blob/main/FunctionGemma/%5BFunctionGemma%5DFinetune_FunctionGemma_270M_for_Mobile_Actions_with_Hugging_Face.ipynb)

## Citation

If you use this model, please cite the original FunctionGemma paper and the Google Mobile Actions dataset:

```bibtex
@misc{functiongemma2024,
  title={FunctionGemma: Function Calling for Gemma Models},
  author={Google},
  year={2024},
  url={https://huggingface.co/google/functiongemma-270m-it}
}
```

## License

This model is released under the Gemma license. See the [Gemma Terms of Use](https://ai.google.dev/gemma/terms) for details.