---
library_name: transformers
tags: []
---

language:
- ar
tags:
- function-calling
- tool-use
- arabic
- instruction-tuning
- gemma
- transformers
license: apache-2.0
base_model: google/functiongemma-270m-it
---

# FunctionGemma-270M Arabic Tool Use

This model is a finetuned version of **`google/functiongemma-270m-it`** for Arabic **tool use / function calling** across multiple dialects and domains.

It is trained to produce **exactly one tool call** when a tool is required, using **FunctionGemma-native tool formatting** (special function-call tokens) and structured JSON arguments.

## Base model
- `google/functiongemma-270m-it`

## Dataset
- `metga97/arabic-tooluse-functiongemma-v1`

## What the model outputs

When a tool is required, generation should include a FunctionGemma tool call pattern such as:

- `<start_function_call>call:TOOL_NAME{ ...json args... }<end_function_call>`

For non-tool requests, it returns a short Arabic reply.

## Evaluation (by slang / dialect)

Evaluated on the test split of `metga97/arabic-tooluse-functiongemma-v1`.

### Overall
- Parsed OK rate: **0.891**
- Tool name accuracy: **0.9921**
- Strict EM: **0.6564**
- Key-F1 (avg): **0.9925**
- Missed-call rate: **0.0064**
- False-call rate (negatives): **0.0**

### Strict EM by slang / dialect
- **Egyptian**: 0.6791 (denom_calls: 1069)
- **Gulf**: 0.6237 (denom_calls: 1172)
- **Levantine**: 0.6558 (denom_calls: 706)
- **MSA**: 0.6804 (denom_calls: 1408)
- **Maghrebi**: 0.5455 (denom_calls: 176)

### Strict EM by domain
- banking_finance: 0.6255 (denom_calls: 542)
- ecommerce: 0.64 (denom_calls: 550)
- government_services: 0.7651 (denom_calls: 613)
- healthcare: 0.5754 (denom_calls: 577)
- islamic_services: 0.7119 (denom_calls: 597)
- travel: 0.6028 (denom_calls: 564)
- utilities: 0.4652 (denom_calls: 561)
- weather: 0.8653 (denom_calls: 527)

## Inference (important)

### 1) Use left padding for decoder-only generation
Set:
- `tokenizer.padding_side = "left"`
- `tokenizer.pad_token = tokenizer.eos_token` (if missing)

### 2) Pass tools via `apply_chat_template(..., tools=tools_list)`
This is critical for FunctionGemma-style function calling.

Example outline:
1. Select a tool subset for the request (domain pack + deterministic sampling).
2. Build prompt with `apply_chat_template` including `tools=tools_list`.
3. `generate()` deterministically (`do_sample=False`, `temperature=0.0`).
4. Parse tool call tokens and arguments.

## Known limitations / improvement ideas

- Some outputs may translate slot values into English (e.g., “Abu Dhabi”, “ID renewal”).
  - Mitigations: stronger developer prompt constraints, post-processing, adding explicit anti-translation supervision, and/or filtering/rebalancing training examples where values are English.
- Parsed OK < 1.0: you can improve formatting consistency with:
  - longer training
  - slightly stronger prompt
  - adding more negative/no-tool examples with explicit non-tool responses