| | --- |
| | language: |
| | - ar |
| | license: apache-2.0 |
| | base_model: unsloth/functiongemma-270m-it |
| | tags: |
| | - function-calling |
| | - arabic |
| | - tool-use |
| | - agentic |
| | - gemma |
| | - fine-tuned |
| | datasets: |
| | - AISA-Framework/AISA-AR-FunctionCall |
| | pipeline_tag: text-generation |
| | library_name: transformers |
| | --- |
| | |
| |
|
| | # AISA-AR-FunctionCall-FT |
| |
|
| | <p align="center"> |
| | <img src="https://cdn-uploads.huggingface.co/production/uploads/628f7a71dd993507cfcbe587/vnL90Tybn1528x21dMNsd.png" width="700"/> |
| | </p> |
| |
|
| | **Reliable Arabic Structured Tool Calling via Data-Centric Fine-Tuning** |
| |
|
| | `AISA-AR-FunctionCall-FT` is a fully fine-tuned Arabic function-calling model built on top of [FunctionGemma (Gemma 3 270M)](https://huggingface.co/unsloth/functiongemma-270m-it) and optimized for structured tool invocation in Arabic agentic systems. |
| |
|
| | The model converts natural Arabic requests into structured executable API calls, enabling reliable integration between language models and external tools. |
| |
|
| | > This model is part of the **AISA** (Agentic AI Systems Architecture) initiative. |
| |
|
| |
|
| | ## Try the Model in Google Colab |
| |
|
| | You can run a full inference example using the notebook below. |
| |
|
| | [](https://colab.research.google.com/drive/1zTBeIEvb66AO6GVWZCkY-8PyYM01KQyO?usp=sharing) |
| |
|
| | The notebook demonstrates: |
| |
|
| | - Loading the model |
| | - Defining tool schemas |
| | - Generating structured tool calls |
| | - Parsing function call outputs |
| |
|
| | --- |
| |
|
| | ## Model Overview |
| |
|
| | | Field | Value | |
| | |---|---| |
| | | **Model name** | AISA-AR-FunctionCall-FT | |
| | | **Base model** | unsloth/functiongemma-270m-it | |
| | | **Architecture** | Gemma 3 (270M parameters) | |
| | | **Fine-tuning type** | Full-parameter supervised fine-tuning | |
| | | **Primary task** | Arabic function calling / tool invocation | |
| |
|
| | The model is designed to translate Arabic natural language requests into structured tool calls following the FunctionGemma tool-calling format. |
| |
|
| | --- |
| |
|
| | ## Key Capabilities |
| |
|
| | - Arabic natural language → structured API calls |
| | - Multi-dialect Arabic understanding |
| | - Tool selection and argument extraction |
| | - Structured execution environments |
| |
|
| | **Supported domains:** |
| |
|
| | | Domain | |
| | |---| |
| | | Travel | |
| | | Utilities | |
| | | Islamic services | |
| | | Weather | |
| | | Healthcare | |
| | | Banking & finance | |
| | | E-commerce | |
| | | Government services | |
| |
|
| | --- |
| |
|
| | ## Dataset |
| |
|
| | The model is trained on **AISA-AR-FunctionCall** — a production-ready Arabic function-calling dataset built through a rigorous data-centric pipeline: |
| |
|
| | - Dataset auditing |
| | - Schema normalization |
| | - Enum correction |
| | - Tool pruning |
| | - Prompt restructuring |
| | - Tool sampling |
| |
|
| | **Dataset splits:** |
| |
|
| | | Split | Samples | |
| | |---|---| |
| | | Train | 41,104 | |
| | | Validation | 4,568 | |
| | | Test | 5,079 | |
| |
|
| | **Dataset includes:** |
| | - 5 Arabic dialects |
| | - 8 real-world domains |
| | - 27 tool schemas |
| | - Structured tool-call annotations |
| |
|
| | Dataset: [AISA-Framework/AISA-AR-FunctionCall](https://huggingface.co/datasets/AISA-Framework/AISA-AR-FunctionCall) |
| |
|
| | --- |
| |
|
| | ## Training Methodology |
| |
|
| | The model was trained using a **data-centric fine-tuning pipeline** designed to stabilize structured execution. |
| |
|
| | **Key pipeline steps:** |
| |
|
| | 1. Structural dataset auditing |
| | 2. Enum constraint repair |
| | 3. Tool schema normalization |
| | 4. Tool pruning (36 → 27 tools) |
| | 5. Tool sampling to prevent prompt truncation |
| | 6. FunctionGemma-compatible chat serialization |
| | 7. Completion-only supervised fine-tuning |
| |
|
| | **Training configuration:** |
| |
|
| | | Parameter | Value | |
| | |---|---| |
| | | Model size | 270M | |
| | | Training type | Full fine-tuning | |
| | | Epochs | 2 | |
| | | Effective batch size | 32 | |
| | | Learning rate | 2e-5 | |
| | | Optimizer | 8-bit AdamW | |
| | | Scheduler | Cosine | |
| | | Precision | BF16 | |
| | | Gradient checkpointing | Enabled | |
| |
|
| | --- |
| |
|
| | ## Evaluation Results |
| |
|
| | Evaluation was performed on a held-out test set of **5,079 samples**. |
| |
|
| | ### Clean Positive Evaluation (n = 2,873) |
| |
|
| | | Metric | Baseline | AISA-AR-FunctionCall-FT | |
| | |---|---|---| |
| | | Function Name Accuracy | 0.0804 | **0.6547** | |
| | | Full Tool-Call Match | 0.0056 | **0.3362** | |
| | | Argument Key F1 | 0.0600 | **0.5728** | |
| | | Argument Exact Match | 0.0422 | **0.6377** | |
| | | Parse Failure Rate | 0.8726 | **0.0084** | |
| | | Format Validity | 0.1274 | **0.9916** | |
| | | Hallucination Rate | 0.0003 | 0.0226 | |
| |
|
| | > **Key improvement:** Parse failure reduced from **87% → <1%** |
| |
|
| | ### Dialect Performance |
| |
|
| | | Dialect | Function Accuracy | |
| | |---|---| |
| | | MSA | 0.761 | |
| | | Gulf | 0.697 | |
| | | Egyptian | 0.683 | |
| | | Levantine | 0.694 | |
| | | Maghrebi | 0.616 | |
| |
|
| | Fine-tuning significantly reduces dialect disparity compared to the baseline model. |
| |
|
| | --- |
| |
|
| | ## Known Limitations |
| |
|
| | Remaining errors are primarily **semantic**, including: |
| |
|
| | - Tool selection ambiguity |
| | - Argument mismatches |
| | - Domain overlap (e.g., weather vs. air quality) |
| |
|
| | Structured formatting errors are largely eliminated. |
| |
|
| | --- |
| |
|
| | ## Example Usage |
| |
|
| | **Prompt:** |
| |
|
| | ``` |
| | ما حالة الطقس في الرياض اليوم؟ |
| | ``` |
| |
|
| | **Model output:** |
| |
|
| | ``` |
| | <start_function_call> |
| | call:get_weather{ |
| | city:<escape>الرياض<escape>, |
| | days:1 |
| | } |
| | <end_function_call> |
| | ``` |
| |
|
| | The structured call can then be executed by the application runtime. |
| |
|
| | --- |
| |
|
| | ## Intended Use |
| |
|
| | This model is designed for: |
| |
|
| | - Arabic AI assistants |
| | - Tool-based agents |
| | - Structured API orchestration |
| | - Arabic enterprise automation |
| | - Research on multilingual tool calling |
| |
|
| | ### Out-of-Scope Uses |
| |
|
| | This model is **not** designed for: |
| |
|
| | - General chatbots or open-ended conversation |
| | - Sensitive decision-making systems |
| | - Safety-critical deployments without additional validation |
| |
|
| | --- |
| |
|
| | ## Related Models |
| |
|
| | | Model | Description | |
| | |---|---| |
| | | [AISA-AR-FunctionCall-Think](https://huggingface.co/AISA-Framework/AISA-AR-FunctionCall-Think) | Reasoning-augmented tool-calling model | |
| |
|
| | --- |
| |
|
| | ## AISA Framework |
| |
|
| | This model is part of the AISA initiative for building reliable agentic AI systems. |
| |
|
| | Model collection: [AISA-Framework/aisa-arabic-functioncall-datasets-and-models](https://huggingface.co/collections/AISA-Framework/aisa-arabic-functioncall-datasets-and-models) |
| |
|
| | --- |
| |
|
| | ## License |
| |
|
| | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |