--- library_name: transformers tags: [] --- language: - ar tags: - function-calling - tool-use - arabic - instruction-tuning - gemma - transformers license: apache-2.0 base_model: google/functiongemma-270m-it --- # FunctionGemma-270M Arabic Tool Use This model is a finetuned version of **`google/functiongemma-270m-it`** for Arabic **tool use / function calling** across multiple dialects and domains. It is trained to produce **exactly one tool call** when a tool is required, using **FunctionGemma-native tool formatting** (special function-call tokens) and structured JSON arguments. ## Base model - `google/functiongemma-270m-it` ## Dataset - `metga97/arabic-tooluse-functiongemma-v1` ## What the model outputs When a tool is required, generation should include a FunctionGemma tool call pattern such as: - `call:TOOL_NAME{ ...json args... }` For non-tool requests, it returns a short Arabic reply. ## Evaluation (by slang / dialect) Evaluated on the test split of `metga97/arabic-tooluse-functiongemma-v1`. ### Overall - Parsed OK rate: **0.891** - Tool name accuracy: **0.9921** - Strict EM: **0.6564** - Key-F1 (avg): **0.9925** - Missed-call rate: **0.0064** - False-call rate (negatives): **0.0** ### Strict EM by slang / dialect - **Egyptian**: 0.6791 (denom_calls: 1069) - **Gulf**: 0.6237 (denom_calls: 1172) - **Levantine**: 0.6558 (denom_calls: 706) - **MSA**: 0.6804 (denom_calls: 1408) - **Maghrebi**: 0.5455 (denom_calls: 176) ### Strict EM by domain - banking_finance: 0.6255 (denom_calls: 542) - ecommerce: 0.64 (denom_calls: 550) - government_services: 0.7651 (denom_calls: 613) - healthcare: 0.5754 (denom_calls: 577) - islamic_services: 0.7119 (denom_calls: 597) - travel: 0.6028 (denom_calls: 564) - utilities: 0.4652 (denom_calls: 561) - weather: 0.8653 (denom_calls: 527) ## Inference (important) ### 1) Use left padding for decoder-only generation Set: - `tokenizer.padding_side = "left"` - `tokenizer.pad_token = tokenizer.eos_token` (if missing) ### 2) Pass tools via `apply_chat_template(..., tools=tools_list)` This is critical for FunctionGemma-style function calling. Example outline: 1. Select a tool subset for the request (domain pack + deterministic sampling). 2. Build prompt with `apply_chat_template` including `tools=tools_list`. 3. `generate()` deterministically (`do_sample=False`, `temperature=0.0`). 4. Parse tool call tokens and arguments. ## Known limitations / improvement ideas - Some outputs may translate slot values into English (e.g., “Abu Dhabi”, “ID renewal”). - Mitigations: stronger developer prompt constraints, post-processing, adding explicit anti-translation supervision, and/or filtering/rebalancing training examples where values are English. - Parsed OK < 1.0: you can improve formatting consistency with: - longer training - slightly stronger prompt - adding more negative/no-tool examples with explicit non-tool responses