AISA-AR-FunctionCall-FT (Quantized Version 4 bit)
Reliable Arabic Structured Tool Calling via Data-Centric Fine-Tuning
AISA-AR-FunctionCall-FT is a fully fine-tuned Arabic function-calling model built on top of FunctionGemma (Gemma 3 270M) and optimized for structured tool invocation in Arabic agentic systems.
The model converts natural Arabic requests into structured executable API calls, enabling reliable integration between language models and external tools.
This model is part of the AISA (Agentic AI Systems Architecture) initiative.
Try the Model in Google Colab
You can run a full inference example using the notebook below.
The notebook demonstrates:
- Loading the model
- Defining tool schemas
- Generating structured tool calls
- Parsing function call outputs
Model Overview
| Field | Value |
|---|---|
| Model name | AISA-AR-FunctionCall-FT |
| Base model | unsloth/functiongemma-270m-it |
| Architecture | Gemma 3 (270M parameters) |
| Fine-tuning type | Full-parameter supervised fine-tuning |
| Primary task | Arabic function calling / tool invocation |
The model is designed to translate Arabic natural language requests into structured tool calls following the FunctionGemma tool-calling format.
Key Capabilities
- Arabic natural language → structured API calls
- Multi-dialect Arabic understanding
- Tool selection and argument extraction
- Structured execution environments
Supported domains:
| Domain |
|---|
| Travel |
| Utilities |
| Islamic services |
| Weather |
| Healthcare |
| Banking & finance |
| E-commerce |
| Government services |
Dataset
The model is trained on AISA-AR-FunctionCall — a production-ready Arabic function-calling dataset built through a rigorous data-centric pipeline:
- Dataset auditing
- Schema normalization
- Enum correction
- Tool pruning
- Prompt restructuring
- Tool sampling
Dataset splits:
| Split | Samples |
|---|---|
| Train | 41,104 |
| Validation | 4,568 |
| Test | 5,079 |
Dataset includes:
- 5 Arabic dialects
- 8 real-world domains
- 27 tool schemas
- Structured tool-call annotations
Dataset: AISA-Framework/AISA-AR-FunctionCall
Training Methodology
The model was trained using a data-centric fine-tuning pipeline designed to stabilize structured execution.
Key pipeline steps:
- Structural dataset auditing
- Enum constraint repair
- Tool schema normalization
- Tool pruning (36 → 27 tools)
- Tool sampling to prevent prompt truncation
- FunctionGemma-compatible chat serialization
- Completion-only supervised fine-tuning
Training configuration:
| Parameter | Value |
|---|---|
| Model size | 270M |
| Training type | Full fine-tuning |
| Epochs | 2 |
| Effective batch size | 32 |
| Learning rate | 2e-5 |
| Optimizer | 8-bit AdamW |
| Scheduler | Cosine |
| Precision | BF16 |
| Gradient checkpointing | Enabled |
Evaluation Results
Evaluation was performed on a held-out test set of 5,079 samples.
Clean Positive Evaluation (n = 2,873)
| Metric | Baseline | AISA-AR-FunctionCall-FT |
|---|---|---|
| Function Name Accuracy | 0.0804 | 0.6547 |
| Full Tool-Call Match | 0.0056 | 0.3362 |
| Argument Key F1 | 0.0600 | 0.5728 |
| Argument Exact Match | 0.0422 | 0.6377 |
| Parse Failure Rate | 0.8726 | 0.0084 |
| Format Validity | 0.1274 | 0.9916 |
| Hallucination Rate | 0.0003 | 0.0226 |
Key improvement: Parse failure reduced from 87% → <1%
Dialect Performance
| Dialect | Function Accuracy |
|---|---|
| MSA | 0.761 |
| Gulf | 0.697 |
| Egyptian | 0.683 |
| Levantine | 0.694 |
| Maghrebi | 0.616 |
Fine-tuning significantly reduces dialect disparity compared to the baseline model.
Known Limitations
Remaining errors are primarily semantic, including:
- Tool selection ambiguity
- Argument mismatches
- Domain overlap (e.g., weather vs. air quality)
Structured formatting errors are largely eliminated.
Example Usage
Prompt:
ما حالة الطقس في الرياض اليوم؟
Model output:
<start_function_call>
call:get_weather{
city:<escape>الرياض<escape>,
days:1
}
<end_function_call>
The structured call can then be executed by the application runtime.
Intended Use
This model is designed for:
- Arabic AI assistants
- Tool-based agents
- Structured API orchestration
- Arabic enterprise automation
- Research on multilingual tool calling
Out-of-Scope Uses
This model is not designed for:
- General chatbots or open-ended conversation
- Sensitive decision-making systems
- Safety-critical deployments without additional validation
Related Models
| Model | Description |
|---|---|
| AISA-AR-FunctionCall-Think | Reasoning-augmented tool-calling model |
AISA Framework
This model is part of the AISA initiative for building reliable agentic AI systems.
Model collection: AISA-Framework/aisa-arabic-functioncall-datasets-and-models
License
- Downloads last month
- 18
4-bit
Model tree for AISA-Framework/AISA-AR-FunctionCall-FT-q4_k_m
Base model
google/functiongemma-270m-it