README.md · AISA-Framework/AISA-AR-FunctionCall-Think at main

File size: 5,787 Bytes

49b980c

---
language:
- ar
license: apache-2.0
base_model: AISA-Framework/AISA-AR-FunctionCall-FT
tags:
- function-calling
- arabic
- tool-use
- agentic
- gemma
- reasoning
- lora
- think
datasets:
- AISA-Framework/AISA-AR-FunctionCall
pipeline_tag: text-generation
library_name: transformers
---

# AISA-AR-FunctionCall-Think

<p align="center">
  <img src="https://cdn-uploads.huggingface.co/production/uploads/628f7a71dd993507cfcbe587/21Mxl67VW-RQFiXTnvheT.png" width="700"/>
</p>

**Reasoning-Augmented Arabic Structured Tool Calling**

`AISA-AR-FunctionCall-Think` is a reasoning-enhanced variant of the Arabic function-calling model introduced in the **AISA-AR-FunctionCall** framework. The model generates an intermediate reasoning trace before invoking a tool, enabling transparent decision-making for Arabic agentic systems.

This model extends [AISA-AR-FunctionCall-FT](https://huggingface.co/AISA-Framework/AISA-AR-FunctionCall-FT) by introducing explicit reasoning supervision using `<think>` blocks prior to tool execution.

---

## Model Overview

| Field | Value |
|---|---|
| **Model name** | AISA-AR-FunctionCall-Think |
| **Base model** | AISA-AR-FunctionCall-FT |
| **Architecture** | Gemma 3 (FunctionGemma 270M) |
| **Training method** | LoRA reasoning fine-tuning |
| **Primary task** | Arabic reasoning-aware function calling |

The model produces outputs in the following pattern:

```
<think>
reasoning about tool selection
</think>
<start_function_call>
call:tool_name{arguments}
</end_function_call>
```

This allows the system to expose the reasoning behind tool selection.

---

## Key Capabilities

- Reasoning-aware tool selection
- Explicit decision traces for tool invocation
- Improved argument extraction consistency
- Interpretable structured execution

**Supported domains:**

| Domain |
|---|
| Travel |
| Utilities |
| Islamic services |
| Weather |
| Healthcare |
| Banking & finance |
| E-commerce |
| Government services |

**Supported Arabic dialect groups:**

- Modern Standard Arabic (MSA)
- Gulf
- Egyptian
- Levantine
- Maghrebi

---

## Training Dataset

Training uses a subset of the [AISA-AR-FunctionCall](https://huggingface.co/datasets/AISA-Framework/AISA-AR-FunctionCall) dataset with reasoning annotations.

| Property | Value |
|---|---|
| Dataset size | ~12k reasoning-augmented samples |
| Dialect coverage | 5 Arabic dialects |
| Domains | 8 real-world domains |
| Tools | 27 structured tools |

---

## Training Methodology

The reasoning model is trained by augmenting assistant outputs with explicit reasoning segments.

**Training format:**

```
<think>
tool selection reasoning
</think>
<start_function_call>
call:tool{arguments}
</end_function_call>
```

Reasoning supervision is enforced during inference by priming the model to begin its generation with `<think>`.

**Training configuration:**

| Parameter | Value |
|---|---|
| Training type | LoRA fine-tuning |
| LoRA rank | 64 |
| Alpha | 64 |
| Dropout | 0.05 |
| Trainable parameters | ~5.36% |
| Epochs | 3 |
| Learning rate | 3e-6 |
| Effective batch size | 32 |
| Optimizer | 8-bit AdamW |
| Scheduler | Cosine |

Additional training signals include **negative tool examples** to reduce hallucinated tool calls when no tool invocation is required.

---

## Evaluation Results

Evaluation is performed on a strict reasoning evaluation subset.

### Strict Evaluation (n = 240)

| Metric | Score |
|---|---|
| Tool Call Rate | 0.992 |
| Think-Before-Call Rate | **1.000** |
| Function Name Accuracy | 0.992 |
| Argument F1 | **1.000** |
| Decision Accuracy | 0.992 |
| Hallucination Rate | **0.000** |

These results indicate that the model consistently performs reasoning before tool invocation and achieves near-perfect structured alignment within the evaluated subset.

### Important Note on Format Validation

Standard function-call validators may classify reasoning outputs as **parse failures** because `<think>` tokens appear before the function call marker.

This does **not** indicate structural instability — it reflects a difference in serialization format. When reasoning segments are permitted, tool invocation correctness remains near-perfect.

---

## Example Usage

**User query:**

```
ما حالة الطقس في الرياض اليوم؟
```

**Model output:**

```
<think>
المستخدم يريد معرفة حالة الطقس في مدينة الرياض، لذا يجب استخدام أداة get_weather.
</think>
<start_function_call>
call:get_weather{city:<escape>الرياض<escape>,days:1}
</end_function_call>
```

---

## Intended Use

This model is intended for:

- Research on reasoning-aware tool calling
- Interpretable agent systems
- Arabic reasoning supervision experiments
- Debugging tool selection behavior

### Production Recommendation

This model is an **exploratory research variant**. For production deployment, we recommend using:

[AISA-AR-FunctionCall-FT](https://huggingface.co/AISA-Framework/AISA-AR-FunctionCall-FT)

---

## Related Resources

| Resource | Link |
|---|---|
| Dataset | [AISA-Framework/AISA-AR-FunctionCall](https://huggingface.co/datasets/AISA-Framework/AISA-AR-FunctionCall) |
| Production model | [AISA-AR-FunctionCall-FT](https://huggingface.co/AISA-Framework/AISA-AR-FunctionCall-FT) |
| Model collection | [AISA Arabic FunctionCall](https://huggingface.co/collections/AISA-Framework/aisa-arabic-functioncall-datasets-and-models) |

---

## Paper

**From Language to Action in Arabic: Reliable Structured Tool Calling via Data-Centric Fine-Tuning**

*AISA Framework*

---

## AISA Framework

This model is part of the **AISA** (Agentic AI Systems Architecture) initiative for building reliable multilingual AI agents.

---

## License

[Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)