File size: 2,964 Bytes
16e5fa3 5ad0788 16e5fa3 5ad0788 16e5fa3 5ad0788 16e5fa3 5ad0788 16e5fa3 5ad0788 16e5fa3 5ad0788 1045749 16e5fa3 5ad0788 16e5fa3 5ad0788 16e5fa3 5ad0788 16e5fa3 5ad0788 16e5fa3 5ad0788 16e5fa3 5ad0788 16e5fa3 5ad0788 16e5fa3 5ad0788 16e5fa3 5ad0788 16e5fa3 5ad0788 16e5fa3 5ad0788 16e5fa3 5ad0788 16e5fa3 5ad0788 16e5fa3 5ad0788 16e5fa3 5ad0788 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 |
---
library_name: transformers
tags: []
---
language:
- ar
tags:
- function-calling
- tool-use
- arabic
- instruction-tuning
- gemma
- transformers
license: apache-2.0
base_model: google/functiongemma-270m-it
---
# FunctionGemma-270M Arabic Tool Use
This model is a finetuned version of **`google/functiongemma-270m-it`** for Arabic **tool use / function calling** across multiple dialects and domains.
It is trained to produce **exactly one tool call** when a tool is required, using **FunctionGemma-native tool formatting** (special function-call tokens) and structured JSON arguments.
## Base model
- `google/functiongemma-270m-it`
## Dataset
- `metga97/arabic-tooluse-functiongemma-v1`
## What the model outputs
When a tool is required, generation should include a FunctionGemma tool call pattern such as:
- `<start_function_call>call:TOOL_NAME{ ...json args... }<end_function_call>`
For non-tool requests, it returns a short Arabic reply.
## Evaluation (by slang / dialect)
Evaluated on the test split of `metga97/arabic-tooluse-functiongemma-v1`.
### Overall
- Parsed OK rate: **0.891**
- Tool name accuracy: **0.9921**
- Strict EM: **0.6564**
- Key-F1 (avg): **0.9925**
- Missed-call rate: **0.0064**
- False-call rate (negatives): **0.0**
### Strict EM by slang / dialect
- **Egyptian**: 0.6791 (denom_calls: 1069)
- **Gulf**: 0.6237 (denom_calls: 1172)
- **Levantine**: 0.6558 (denom_calls: 706)
- **MSA**: 0.6804 (denom_calls: 1408)
- **Maghrebi**: 0.5455 (denom_calls: 176)
### Strict EM by domain
- banking_finance: 0.6255 (denom_calls: 542)
- ecommerce: 0.64 (denom_calls: 550)
- government_services: 0.7651 (denom_calls: 613)
- healthcare: 0.5754 (denom_calls: 577)
- islamic_services: 0.7119 (denom_calls: 597)
- travel: 0.6028 (denom_calls: 564)
- utilities: 0.4652 (denom_calls: 561)
- weather: 0.8653 (denom_calls: 527)
## Inference (important)
### 1) Use left padding for decoder-only generation
Set:
- `tokenizer.padding_side = "left"`
- `tokenizer.pad_token = tokenizer.eos_token` (if missing)
### 2) Pass tools via `apply_chat_template(..., tools=tools_list)`
This is critical for FunctionGemma-style function calling.
Example outline:
1. Select a tool subset for the request (domain pack + deterministic sampling).
2. Build prompt with `apply_chat_template` including `tools=tools_list`.
3. `generate()` deterministically (`do_sample=False`, `temperature=0.0`).
4. Parse tool call tokens and arguments.
## Known limitations / improvement ideas
- Some outputs may translate slot values into English (e.g., “Abu Dhabi”, “ID renewal”).
- Mitigations: stronger developer prompt constraints, post-processing, adding explicit anti-translation supervision, and/or filtering/rebalancing training examples where values are English.
- Parsed OK < 1.0: you can improve formatting consistency with:
- longer training
- slightly stronger prompt
- adding more negative/no-tool examples with explicit non-tool responses |