Omartificial-Intelligence-Space's picture
Update README.md
48795dc verified
---
language:
- ar
license: apache-2.0
base_model: unsloth/functiongemma-270m-it
tags:
- function-calling
- arabic
- tool-use
- agentic
- gemma
- fine-tuned
datasets:
- AISA-Framework/AISA-AR-FunctionCall
pipeline_tag: text-generation
library_name: transformers
---
# AISA-AR-FunctionCall-FT
<p align="center">
<img src="https://cdn-uploads.huggingface.co/production/uploads/628f7a71dd993507cfcbe587/vnL90Tybn1528x21dMNsd.png" width="700"/>
</p>
**Reliable Arabic Structured Tool Calling via Data-Centric Fine-Tuning**
`AISA-AR-FunctionCall-FT` is a fully fine-tuned Arabic function-calling model built on top of [FunctionGemma (Gemma 3 270M)](https://huggingface.co/unsloth/functiongemma-270m-it) and optimized for structured tool invocation in Arabic agentic systems.
The model converts natural Arabic requests into structured executable API calls, enabling reliable integration between language models and external tools.
> This model is part of the **AISA** (Agentic AI Systems Architecture) initiative.
## Try the Model in Google Colab
You can run a full inference example using the notebook below.
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1zTBeIEvb66AO6GVWZCkY-8PyYM01KQyO?usp=sharing)
The notebook demonstrates:
- Loading the model
- Defining tool schemas
- Generating structured tool calls
- Parsing function call outputs
---
## Model Overview
| Field | Value |
|---|---|
| **Model name** | AISA-AR-FunctionCall-FT |
| **Base model** | unsloth/functiongemma-270m-it |
| **Architecture** | Gemma 3 (270M parameters) |
| **Fine-tuning type** | Full-parameter supervised fine-tuning |
| **Primary task** | Arabic function calling / tool invocation |
The model is designed to translate Arabic natural language requests into structured tool calls following the FunctionGemma tool-calling format.
---
## Key Capabilities
- Arabic natural language → structured API calls
- Multi-dialect Arabic understanding
- Tool selection and argument extraction
- Structured execution environments
**Supported domains:**
| Domain |
|---|
| Travel |
| Utilities |
| Islamic services |
| Weather |
| Healthcare |
| Banking & finance |
| E-commerce |
| Government services |
---
## Dataset
The model is trained on **AISA-AR-FunctionCall** — a production-ready Arabic function-calling dataset built through a rigorous data-centric pipeline:
- Dataset auditing
- Schema normalization
- Enum correction
- Tool pruning
- Prompt restructuring
- Tool sampling
**Dataset splits:**
| Split | Samples |
|---|---|
| Train | 41,104 |
| Validation | 4,568 |
| Test | 5,079 |
**Dataset includes:**
- 5 Arabic dialects
- 8 real-world domains
- 27 tool schemas
- Structured tool-call annotations
Dataset: [AISA-Framework/AISA-AR-FunctionCall](https://huggingface.co/datasets/AISA-Framework/AISA-AR-FunctionCall)
---
## Training Methodology
The model was trained using a **data-centric fine-tuning pipeline** designed to stabilize structured execution.
**Key pipeline steps:**
1. Structural dataset auditing
2. Enum constraint repair
3. Tool schema normalization
4. Tool pruning (36 → 27 tools)
5. Tool sampling to prevent prompt truncation
6. FunctionGemma-compatible chat serialization
7. Completion-only supervised fine-tuning
**Training configuration:**
| Parameter | Value |
|---|---|
| Model size | 270M |
| Training type | Full fine-tuning |
| Epochs | 2 |
| Effective batch size | 32 |
| Learning rate | 2e-5 |
| Optimizer | 8-bit AdamW |
| Scheduler | Cosine |
| Precision | BF16 |
| Gradient checkpointing | Enabled |
---
## Evaluation Results
Evaluation was performed on a held-out test set of **5,079 samples**.
### Clean Positive Evaluation (n = 2,873)
| Metric | Baseline | AISA-AR-FunctionCall-FT |
|---|---|---|
| Function Name Accuracy | 0.0804 | **0.6547** |
| Full Tool-Call Match | 0.0056 | **0.3362** |
| Argument Key F1 | 0.0600 | **0.5728** |
| Argument Exact Match | 0.0422 | **0.6377** |
| Parse Failure Rate | 0.8726 | **0.0084** |
| Format Validity | 0.1274 | **0.9916** |
| Hallucination Rate | 0.0003 | 0.0226 |
> **Key improvement:** Parse failure reduced from **87% → <1%**
### Dialect Performance
| Dialect | Function Accuracy |
|---|---|
| MSA | 0.761 |
| Gulf | 0.697 |
| Egyptian | 0.683 |
| Levantine | 0.694 |
| Maghrebi | 0.616 |
Fine-tuning significantly reduces dialect disparity compared to the baseline model.
---
## Known Limitations
Remaining errors are primarily **semantic**, including:
- Tool selection ambiguity
- Argument mismatches
- Domain overlap (e.g., weather vs. air quality)
Structured formatting errors are largely eliminated.
---
## Example Usage
**Prompt:**
```
ما حالة الطقس في الرياض اليوم؟
```
**Model output:**
```
<start_function_call>
call:get_weather{
city:<escape>الرياض<escape>,
days:1
}
<end_function_call>
```
The structured call can then be executed by the application runtime.
---
## Intended Use
This model is designed for:
- Arabic AI assistants
- Tool-based agents
- Structured API orchestration
- Arabic enterprise automation
- Research on multilingual tool calling
### Out-of-Scope Uses
This model is **not** designed for:
- General chatbots or open-ended conversation
- Sensitive decision-making systems
- Safety-critical deployments without additional validation
---
## Related Models
| Model | Description |
|---|---|
| [AISA-AR-FunctionCall-Think](https://huggingface.co/AISA-Framework/AISA-AR-FunctionCall-Think) | Reasoning-augmented tool-calling model |
---
## AISA Framework
This model is part of the AISA initiative for building reliable agentic AI systems.
Model collection: [AISA-Framework/aisa-arabic-functioncall-datasets-and-models](https://huggingface.co/collections/AISA-Framework/aisa-arabic-functioncall-datasets-and-models)
---
## License
[Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)