File size: 5,957 Bytes

---
language:
- ar
license: apache-2.0
base_model: unsloth/functiongemma-270m-it
tags:
- function-calling
- arabic
- tool-use
- agentic
- gemma
- fine-tuned
datasets:
- AISA-Framework/AISA-AR-FunctionCall
pipeline_tag: text-generation
library_name: transformers
---


# AISA-AR-FunctionCall-FT

<p align="center">
  <img src="https://cdn-uploads.huggingface.co/production/uploads/628f7a71dd993507cfcbe587/vnL90Tybn1528x21dMNsd.png" width="700"/>
</p>

**Reliable Arabic Structured Tool Calling via Data-Centric Fine-Tuning**

`AISA-AR-FunctionCall-FT` is a fully fine-tuned Arabic function-calling model built on top of [FunctionGemma (Gemma 3 270M)](https://huggingface.co/unsloth/functiongemma-270m-it) and optimized for structured tool invocation in Arabic agentic systems.

The model converts natural Arabic requests into structured executable API calls, enabling reliable integration between language models and external tools.

> This model is part of the **AISA** (Agentic AI Systems Architecture) initiative.


## Try the Model in Google Colab

You can run a full inference example using the notebook below.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1zTBeIEvb66AO6GVWZCkY-8PyYM01KQyO?usp=sharing)

The notebook demonstrates:

- Loading the model
- Defining tool schemas
- Generating structured tool calls
- Parsing function call outputs

---

## Model Overview

| Field | Value |
|---|---|
| **Model name** | AISA-AR-FunctionCall-FT |
| **Base model** | unsloth/functiongemma-270m-it |
| **Architecture** | Gemma 3 (270M parameters) |
| **Fine-tuning type** | Full-parameter supervised fine-tuning |
| **Primary task** | Arabic function calling / tool invocation |

The model is designed to translate Arabic natural language requests into structured tool calls following the FunctionGemma tool-calling format.

---

## Key Capabilities

- Arabic natural language → structured API calls
- Multi-dialect Arabic understanding
- Tool selection and argument extraction
- Structured execution environments

**Supported domains:**

| Domain |
|---|
| Travel |
| Utilities |
| Islamic services |
| Weather |
| Healthcare |
| Banking & finance |
| E-commerce |
| Government services |

---

## Dataset

The model is trained on **AISA-AR-FunctionCall** — a production-ready Arabic function-calling dataset built through a rigorous data-centric pipeline:

- Dataset auditing
- Schema normalization
- Enum correction
- Tool pruning
- Prompt restructuring
- Tool sampling

**Dataset splits:**

| Split | Samples |
|---|---|
| Train | 41,104 |
| Validation | 4,568 |
| Test | 5,079 |

**Dataset includes:**
- 5 Arabic dialects
- 8 real-world domains
- 27 tool schemas
- Structured tool-call annotations

Dataset: [AISA-Framework/AISA-AR-FunctionCall](https://huggingface.co/datasets/AISA-Framework/AISA-AR-FunctionCall)

---

## Training Methodology

The model was trained using a **data-centric fine-tuning pipeline** designed to stabilize structured execution.

**Key pipeline steps:**

1. Structural dataset auditing
2. Enum constraint repair
3. Tool schema normalization
4. Tool pruning (36 → 27 tools)
5. Tool sampling to prevent prompt truncation
6. FunctionGemma-compatible chat serialization
7. Completion-only supervised fine-tuning

**Training configuration:**

| Parameter | Value |
|---|---|
| Model size | 270M |
| Training type | Full fine-tuning |
| Epochs | 2 |
| Effective batch size | 32 |
| Learning rate | 2e-5 |
| Optimizer | 8-bit AdamW |
| Scheduler | Cosine |
| Precision | BF16 |
| Gradient checkpointing | Enabled |

---

## Evaluation Results

Evaluation was performed on a held-out test set of **5,079 samples**.

### Clean Positive Evaluation (n = 2,873)

| Metric | Baseline | AISA-AR-FunctionCall-FT |
|---|---|---|
| Function Name Accuracy | 0.0804 | **0.6547** |
| Full Tool-Call Match | 0.0056 | **0.3362** |
| Argument Key F1 | 0.0600 | **0.5728** |
| Argument Exact Match | 0.0422 | **0.6377** |
| Parse Failure Rate | 0.8726 | **0.0084** |
| Format Validity | 0.1274 | **0.9916** |
| Hallucination Rate | 0.0003 | 0.0226 |

> **Key improvement:** Parse failure reduced from **87% → <1%**

### Dialect Performance

| Dialect | Function Accuracy |
|---|---|
| MSA | 0.761 |
| Gulf | 0.697 |
| Egyptian | 0.683 |
| Levantine | 0.694 |
| Maghrebi | 0.616 |

Fine-tuning significantly reduces dialect disparity compared to the baseline model.

---

## Known Limitations

Remaining errors are primarily **semantic**, including:

- Tool selection ambiguity
- Argument mismatches
- Domain overlap (e.g., weather vs. air quality)

Structured formatting errors are largely eliminated.

---

## Example Usage

**Prompt:**

```
ما حالة الطقس في الرياض اليوم؟
```

**Model output:**

```
<start_function_call>
call:get_weather{
  city:<escape>الرياض<escape>,
  days:1
}
<end_function_call>
```

The structured call can then be executed by the application runtime.

---

## Intended Use

This model is designed for:

- Arabic AI assistants
- Tool-based agents
- Structured API orchestration
- Arabic enterprise automation
- Research on multilingual tool calling

### Out-of-Scope Uses

This model is **not** designed for:

- General chatbots or open-ended conversation
- Sensitive decision-making systems
- Safety-critical deployments without additional validation

---

## Related Models

| Model | Description |
|---|---|
| [AISA-AR-FunctionCall-Think](https://huggingface.co/AISA-Framework/AISA-AR-FunctionCall-Think) | Reasoning-augmented tool-calling model |

---

## AISA Framework

This model is part of the AISA initiative for building reliable agentic AI systems.

Model collection: [AISA-Framework/aisa-arabic-functioncall-datasets-and-models](https://huggingface.co/collections/AISA-Framework/aisa-arabic-functioncall-datasets-and-models)

---

## License

[Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)