--- language: - ar license: apache-2.0 base_model: unsloth/functiongemma-270m-it tags: - function-calling - arabic - tool-use - agentic - gemma - fine-tuned datasets: - AISA-Framework/AISA-AR-FunctionCall pipeline_tag: text-generation library_name: transformers --- # AISA-AR-FunctionCall-FT

**Reliable Arabic Structured Tool Calling via Data-Centric Fine-Tuning** `AISA-AR-FunctionCall-FT` is a fully fine-tuned Arabic function-calling model built on top of [FunctionGemma (Gemma 3 270M)](https://huggingface.co/unsloth/functiongemma-270m-it) and optimized for structured tool invocation in Arabic agentic systems. The model converts natural Arabic requests into structured executable API calls, enabling reliable integration between language models and external tools. > This model is part of the **AISA** (Agentic AI Systems Architecture) initiative. ## Try the Model in Google Colab You can run a full inference example using the notebook below. [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1zTBeIEvb66AO6GVWZCkY-8PyYM01KQyO?usp=sharing) The notebook demonstrates: - Loading the model - Defining tool schemas - Generating structured tool calls - Parsing function call outputs --- ## Model Overview | Field | Value | |---|---| | **Model name** | AISA-AR-FunctionCall-FT | | **Base model** | unsloth/functiongemma-270m-it | | **Architecture** | Gemma 3 (270M parameters) | | **Fine-tuning type** | Full-parameter supervised fine-tuning | | **Primary task** | Arabic function calling / tool invocation | The model is designed to translate Arabic natural language requests into structured tool calls following the FunctionGemma tool-calling format. --- ## Key Capabilities - Arabic natural language → structured API calls - Multi-dialect Arabic understanding - Tool selection and argument extraction - Structured execution environments **Supported domains:** | Domain | |---| | Travel | | Utilities | | Islamic services | | Weather | | Healthcare | | Banking & finance | | E-commerce | | Government services | --- ## Dataset The model is trained on **AISA-AR-FunctionCall** — a production-ready Arabic function-calling dataset built through a rigorous data-centric pipeline: - Dataset auditing - Schema normalization - Enum correction - Tool pruning - Prompt restructuring - Tool sampling **Dataset splits:** | Split | Samples | |---|---| | Train | 41,104 | | Validation | 4,568 | | Test | 5,079 | **Dataset includes:** - 5 Arabic dialects - 8 real-world domains - 27 tool schemas - Structured tool-call annotations Dataset: [AISA-Framework/AISA-AR-FunctionCall](https://huggingface.co/datasets/AISA-Framework/AISA-AR-FunctionCall) --- ## Training Methodology The model was trained using a **data-centric fine-tuning pipeline** designed to stabilize structured execution. **Key pipeline steps:** 1. Structural dataset auditing 2. Enum constraint repair 3. Tool schema normalization 4. Tool pruning (36 → 27 tools) 5. Tool sampling to prevent prompt truncation 6. FunctionGemma-compatible chat serialization 7. Completion-only supervised fine-tuning **Training configuration:** | Parameter | Value | |---|---| | Model size | 270M | | Training type | Full fine-tuning | | Epochs | 2 | | Effective batch size | 32 | | Learning rate | 2e-5 | | Optimizer | 8-bit AdamW | | Scheduler | Cosine | | Precision | BF16 | | Gradient checkpointing | Enabled | --- ## Evaluation Results Evaluation was performed on a held-out test set of **5,079 samples**. ### Clean Positive Evaluation (n = 2,873) | Metric | Baseline | AISA-AR-FunctionCall-FT | |---|---|---| | Function Name Accuracy | 0.0804 | **0.6547** | | Full Tool-Call Match | 0.0056 | **0.3362** | | Argument Key F1 | 0.0600 | **0.5728** | | Argument Exact Match | 0.0422 | **0.6377** | | Parse Failure Rate | 0.8726 | **0.0084** | | Format Validity | 0.1274 | **0.9916** | | Hallucination Rate | 0.0003 | 0.0226 | > **Key improvement:** Parse failure reduced from **87% → <1%** ### Dialect Performance | Dialect | Function Accuracy | |---|---| | MSA | 0.761 | | Gulf | 0.697 | | Egyptian | 0.683 | | Levantine | 0.694 | | Maghrebi | 0.616 | Fine-tuning significantly reduces dialect disparity compared to the baseline model. --- ## Known Limitations Remaining errors are primarily **semantic**, including: - Tool selection ambiguity - Argument mismatches - Domain overlap (e.g., weather vs. air quality) Structured formatting errors are largely eliminated. --- ## Example Usage **Prompt:** ``` ما حالة الطقس في الرياض اليوم؟ ``` **Model output:** ``` call:get_weather{ city:الرياض, days:1 } ``` The structured call can then be executed by the application runtime. --- ## Intended Use This model is designed for: - Arabic AI assistants - Tool-based agents - Structured API orchestration - Arabic enterprise automation - Research on multilingual tool calling ### Out-of-Scope Uses This model is **not** designed for: - General chatbots or open-ended conversation - Sensitive decision-making systems - Safety-critical deployments without additional validation --- ## Related Models | Model | Description | |---|---| | [AISA-AR-FunctionCall-Think](https://huggingface.co/AISA-Framework/AISA-AR-FunctionCall-Think) | Reasoning-augmented tool-calling model | --- ## AISA Framework This model is part of the AISA initiative for building reliable agentic AI systems. Model collection: [AISA-Framework/aisa-arabic-functioncall-datasets-and-models](https://huggingface.co/collections/AISA-Framework/aisa-arabic-functioncall-datasets-and-models) --- ## License [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)