functiongemma-270m-mobile-actions / README.md

Update README.md

6fbe6b7 verified 5 days ago

9.16 kB

	---
	base_model: google/functiongemma-270m-it
	tags:
	- function-calling
	- mobile-actions
	- gemma
	library_name: transformers
	datasets:
	- google/mobile-actions
	language:
	- en
	license: gemma
	---

	# FunctionGemma 270M for Mobile Actions

	This model is a fine-tuned version of [google/functiongemma-270m-it](https://huggingface.co/google/functiongemma-270m-it) specialized for mobile assistant actions. It has been trained on the [google/mobile-actions](https://huggingface.co/datasets/google/mobile-actions) dataset to perform structured function calling for common mobile device tasks.

	## Model Description

	Base Model: `google/functiongemma-270m-it` - A 270M parameter instruction-tuned model from Google's FunctionGemma family, designed for function calling tasks.

	Specialization: Mobile assistant actions including:
	- Calendar event management
	- Email composition and sending
	- Contact creation
	- Flashlight control
	- Wi-Fi settings navigation
	- Map location display

	Training Objective: The model learns to emit structured function calls in the format `call:<function_name>{arg1:value1,arg2:value2,...}` instead of natural language responses.

	## Supported Functions

	The model is optimized to call these mobile action functions:

	1. `turn_on_flashlight()` - Turns the device flashlight on
	2. `turn_off_flashlight()` - Turns the device flashlight off
	3. `create_contact(first_name, last_name, phone_number?, email?)` - Creates a new contact
	4. `send_email(to, subject, body?)` - Sends an email to a recipient
	5. `show_map(query)` - Displays a location on the map by name, business, or address
	6. `open_wifi_settings()` - Opens the Wi-Fi settings screen
	7. `create_calendar_event(title, datetime)` - Creates a calendar event (datetime in ISO format: `YYYY-MM-DDTHH:MM:SS`)

	## Training Details

	### Training Data

	- Dataset: [google/mobile-actions](https://huggingface.co/datasets/google/mobile-actions)
	- Format: JSONL with prompt-completion pairs
	- Splits:
	- Training set: examples with `"metadata": "train"`
	- Evaluation set: examples with `"metadata": "eval"`
	- Preprocessing: Converted to TRL prompt-completion format with `completion_only_loss=True`

	### Training Procedure

	Fine-tuned using Hugging Face [TRL (Transformer Reinforcement Learning)](https://huggingface.co/docs/trl) with the `SFTTrainer`.

	Training Configuration:
	- Epochs: 4
	- Batch size: 8 per device
	- Gradient accumulation steps: 4
	- Learning rate: 5e-5
	- Scheduler: Cosine
	- Max sequence length: 997 tokens (based on longest example: 897 tokens)
	- Optimizer: AdamW (fused)
	- Precision: bfloat16
	- Gradient checkpointing: Enabled
	- Completion only loss: True (trains only on model outputs, not prompts)

	Training Infrastructure:
	- Hardware: Google Colab A100 GPU
	- Training time: ~~60 minutes for 4 epochs
	- Library versions: transformers==4.57.1, trl==0.25.1, datasets==4.4.1

	### Training Results

	Final metrics after 4 epochs:

	\| Step \| Training Loss \| Validation Loss \| Mean Token Accuracy \|
	\|------\|---------------\|-----------------\|---------------------\|
	\| 500 \| 0.008800 \| 0.013452 \| 0.996691 \|

	The model achieved 99.67% token-level accuracy on the validation set, showing significant improvement over the base model's mobile action capabilities.

	## Intended Use

	This model is designed for:
	- Mobile AI assistants that need to execute device actions based on user requests
	- Voice-controlled mobile applications
	- Conversational agents that interact with mobile device features
	- On-device AI applications (can be converted to `.litertlm` format for deployment)

	## How to Use

	### Basic Inference

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
	import json

	# Load model and tokenizer
	model_id = "jprtr/google_mobile_actions"
	tokenizer = AutoTokenizer.from_pretrained(model_id)
	model = AutoModelForCausalLM.from_pretrained(
	model_id,
	device_map="auto",
	attn_implementation="eager",
	torch_dtype="auto",
	)

	# Create pipeline
	pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)

	# Define the tools (function schemas)
	tools = [
	{
	"function": {
	"name": "create_calendar_event",
	"description": "Creates a new calendar event.",
	"parameters": {
	"type": "OBJECT",
	"properties": {
	"title": {"type": "STRING", "description": "The title of the event."},
	"datetime": {"type": "STRING", "description": "The date and time in YYYY-MM-DDTHH:MM:SS format."},
	},
	"required": ["title", "datetime"],
	},
	}
	},
	{
	"function": {
	"name": "send_email",
	"description": "Sends an email.",
	"parameters": {
	"type": "OBJECT",
	"properties": {
	"to": {"type": "STRING", "description": "The recipient email address."},
	"subject": {"type": "STRING", "description": "The email subject."},
	"body": {"type": "STRING", "description": "The email body."},
	},
	"required": ["to", "subject"],
	},
	}
	},
	# ... add other function definitions
	]

	# Create messages
	messages = [
	{
	"role": "developer",
	"content": (
	"Current date and time given in YYYY-MM-DDTHH:MM:SS format: 2025-07-10T19:06:29\n"
	"Day of week is Thursday\n"
	"You are a model that can do function calling with the following functions\n"
	),
	},
	{
	"role": "user",
	"content": 'Schedule a "team meeting" tomorrow at 4pm.',
	},
	]

	# Apply chat template
	prompt = tokenizer.apply_chat_template(
	messages,
	tools=tools,
	tokenize=False,
	add_generation_prompt=True,
	)

	# Generate
	output = pipe(prompt, max_new_tokens=200)[0]["generated_text"][len(prompt):].strip()
	print("Model output:", output)
	# Example output: call:create_calendar_event{datetime:2025-07-11T16:00:00,title:team meeting}
	```

	### Parsing Function Calls

	The model outputs function calls in a simple format:
	```
	call:<function_name>{arg1:value1,arg2:value2,...}
	```

	For multiple function calls, they appear sequentially:
	```
	call:create_calendar_event{datetime:2025-07-15T10:30:00,title:Dental Checkup}
	call:send_email{to:user@example.com,subject:Appointment,body:See you there!}
	```

	You can parse these by:
	1. Splitting on `call:` to identify individual function calls
	2. Extracting the function name (text before `{`)
	3. Parsing the arguments block (content within `{}`)

	## Evaluation

	The model was evaluated on the held-out test set from the mobile-actions dataset. Evaluation metrics compare exact string matching of the model's function call outputs against ground truth labels.

	Key Observations:
	- The base FunctionGemma 270M model often fails to call appropriate functions for mobile actions
	- After fine-tuning, the model reliably generates correct function calls with proper argument formatting
	- Token-level accuracy on the validation set: 99.67%

	## Limitations

	- The model is specialized for the 7 mobile action functions listed above and may not generalize well to other function calling tasks
	- Date/time parsing relies on context provided in the developer message (current date/time must be specified)
	- The model outputs may occasionally include variations in argument formatting that are semantically correct but don't exactly match the expected format
	- This is a 270M parameter model, so while efficient for mobile deployment, it may have lower accuracy than larger models

	## On-Device Deployment

	The model can be converted to `.litertlm` format for on-device deployment using `ai-edge-torch`. See the [training notebook](https://colab.research.google.com/github/google-gemini/gemma-cookbook/blob/main/FunctionGemma/%5BFunctionGemma%5DFinetune_FunctionGemma_270M_for_Mobile_Actions_with_Hugging_Face.ipynb) for conversion instructions.

	The converted model can be deployed on:
	- Android devices via [Google AI Edge](https://ai.google.dev/edge)
	- [AI Edge Gallery app](https://play.google.com/store/apps/details?id=com.google.ai.edge.gallery)

	## Training Notebook

	For full training details, hyperparameter tuning, and evaluation, see the original Colab notebook:
	[Finetune FunctionGemma 270M for Mobile Actions](https://colab.research.google.com/github/google-gemini/gemma-cookbook/blob/main/FunctionGemma/%5BFunctionGemma%5DFinetune_FunctionGemma_270M_for_Mobile_Actions_with_Hugging_Face.ipynb)

	## Citation

	If you use this model, please cite the original FunctionGemma paper and the Google Mobile Actions dataset:

	```bibtex
	@misc{functiongemma2024,
	title={FunctionGemma: Function Calling for Gemma Models},
	author={Google},
	year={2024},
	url={https://huggingface.co/google/functiongemma-270m-it}
	}
	```

	## License

	This model is released under the Gemma license. See the [Gemma Terms of Use](https://ai.google.dev/gemma/terms) for details.