File size: 5,957 Bytes
95c1548
cb123ca
 
 
95c1548
 
cb123ca
 
 
 
 
 
 
 
 
 
 
 
3945705
cb123ca
 
 
0945968
cb123ca
 
 
 
 
 
 
 
 
 
48795dc
 
 
 
 
 
 
 
 
 
 
 
 
 
cb123ca
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
95c1548
 
cb123ca
 
 
95c1548
cb123ca
 
 
95c1548
cb123ca
95c1548
cb123ca
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
---
language:
- ar
license: apache-2.0
base_model: unsloth/functiongemma-270m-it
tags:
- function-calling
- arabic
- tool-use
- agentic
- gemma
- fine-tuned
datasets:
- AISA-Framework/AISA-AR-FunctionCall
pipeline_tag: text-generation
library_name: transformers
---


# AISA-AR-FunctionCall-FT

<p align="center">
  <img src="https://cdn-uploads.huggingface.co/production/uploads/628f7a71dd993507cfcbe587/vnL90Tybn1528x21dMNsd.png" width="700"/>
</p>

**Reliable Arabic Structured Tool Calling via Data-Centric Fine-Tuning**

`AISA-AR-FunctionCall-FT` is a fully fine-tuned Arabic function-calling model built on top of [FunctionGemma (Gemma 3 270M)](https://huggingface.co/unsloth/functiongemma-270m-it) and optimized for structured tool invocation in Arabic agentic systems.

The model converts natural Arabic requests into structured executable API calls, enabling reliable integration between language models and external tools.

> This model is part of the **AISA** (Agentic AI Systems Architecture) initiative.


## Try the Model in Google Colab

You can run a full inference example using the notebook below.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1zTBeIEvb66AO6GVWZCkY-8PyYM01KQyO?usp=sharing)

The notebook demonstrates:

- Loading the model
- Defining tool schemas
- Generating structured tool calls
- Parsing function call outputs

---

## Model Overview

| Field | Value |
|---|---|
| **Model name** | AISA-AR-FunctionCall-FT |
| **Base model** | unsloth/functiongemma-270m-it |
| **Architecture** | Gemma 3 (270M parameters) |
| **Fine-tuning type** | Full-parameter supervised fine-tuning |
| **Primary task** | Arabic function calling / tool invocation |

The model is designed to translate Arabic natural language requests into structured tool calls following the FunctionGemma tool-calling format.

---

## Key Capabilities

- Arabic natural language → structured API calls
- Multi-dialect Arabic understanding
- Tool selection and argument extraction
- Structured execution environments

**Supported domains:**

| Domain |
|---|
| Travel |
| Utilities |
| Islamic services |
| Weather |
| Healthcare |
| Banking & finance |
| E-commerce |
| Government services |

---

## Dataset

The model is trained on **AISA-AR-FunctionCall** — a production-ready Arabic function-calling dataset built through a rigorous data-centric pipeline:

- Dataset auditing
- Schema normalization
- Enum correction
- Tool pruning
- Prompt restructuring
- Tool sampling

**Dataset splits:**

| Split | Samples |
|---|---|
| Train | 41,104 |
| Validation | 4,568 |
| Test | 5,079 |

**Dataset includes:**
- 5 Arabic dialects
- 8 real-world domains
- 27 tool schemas
- Structured tool-call annotations

Dataset: [AISA-Framework/AISA-AR-FunctionCall](https://huggingface.co/datasets/AISA-Framework/AISA-AR-FunctionCall)

---

## Training Methodology

The model was trained using a **data-centric fine-tuning pipeline** designed to stabilize structured execution.

**Key pipeline steps:**

1. Structural dataset auditing
2. Enum constraint repair
3. Tool schema normalization
4. Tool pruning (36 → 27 tools)
5. Tool sampling to prevent prompt truncation
6. FunctionGemma-compatible chat serialization
7. Completion-only supervised fine-tuning

**Training configuration:**

| Parameter | Value |
|---|---|
| Model size | 270M |
| Training type | Full fine-tuning |
| Epochs | 2 |
| Effective batch size | 32 |
| Learning rate | 2e-5 |
| Optimizer | 8-bit AdamW |
| Scheduler | Cosine |
| Precision | BF16 |
| Gradient checkpointing | Enabled |

---

## Evaluation Results

Evaluation was performed on a held-out test set of **5,079 samples**.

### Clean Positive Evaluation (n = 2,873)

| Metric | Baseline | AISA-AR-FunctionCall-FT |
|---|---|---|
| Function Name Accuracy | 0.0804 | **0.6547** |
| Full Tool-Call Match | 0.0056 | **0.3362** |
| Argument Key F1 | 0.0600 | **0.5728** |
| Argument Exact Match | 0.0422 | **0.6377** |
| Parse Failure Rate | 0.8726 | **0.0084** |
| Format Validity | 0.1274 | **0.9916** |
| Hallucination Rate | 0.0003 | 0.0226 |

> **Key improvement:** Parse failure reduced from **87% → <1%**

### Dialect Performance

| Dialect | Function Accuracy |
|---|---|
| MSA | 0.761 |
| Gulf | 0.697 |
| Egyptian | 0.683 |
| Levantine | 0.694 |
| Maghrebi | 0.616 |

Fine-tuning significantly reduces dialect disparity compared to the baseline model.

---

## Known Limitations

Remaining errors are primarily **semantic**, including:

- Tool selection ambiguity
- Argument mismatches
- Domain overlap (e.g., weather vs. air quality)

Structured formatting errors are largely eliminated.

---

## Example Usage

**Prompt:**

```
ما حالة الطقس في الرياض اليوم؟
```

**Model output:**

```
<start_function_call>
call:get_weather{
  city:<escape>الرياض<escape>,
  days:1
}
<end_function_call>
```

The structured call can then be executed by the application runtime.

---

## Intended Use

This model is designed for:

- Arabic AI assistants
- Tool-based agents
- Structured API orchestration
- Arabic enterprise automation
- Research on multilingual tool calling

### Out-of-Scope Uses

This model is **not** designed for:

- General chatbots or open-ended conversation
- Sensitive decision-making systems
- Safety-critical deployments without additional validation

---

## Related Models

| Model | Description |
|---|---|
| [AISA-AR-FunctionCall-Think](https://huggingface.co/AISA-Framework/AISA-AR-FunctionCall-Think) | Reasoning-augmented tool-calling model |

---

## AISA Framework

This model is part of the AISA initiative for building reliable agentic AI systems.

Model collection: [AISA-Framework/aisa-arabic-functioncall-datasets-and-models](https://huggingface.co/collections/AISA-Framework/aisa-arabic-functioncall-datasets-and-models)

---

## License

[Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)