HeshamHaroon's picture
Update: Auto-evaluation on Space startup
de63c9e verified
---
title: Arabic Function Calling Leaderboard
emoji: ๐Ÿ†
colorFrom: green
colorTo: blue
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: true
license: apache-2.0
tags:
- arabic
- function-calling
- leaderboard
- llm-evaluation
---
# ๐Ÿ† Arabic Function Calling Leaderboard
ู„ูˆุญุฉ ุชู‚ูŠูŠู… ุงุณุชุฏุนุงุก ุงู„ุฏูˆุงู„ ุจุงู„ุนุฑุจูŠุฉ
## Overview
The **Arabic Function Calling Leaderboard (AFCL)** evaluates Large Language Models on their ability to:
1. Understand Arabic queries (MSA + Dialects)
2. Select appropriate functions from available options
3. Extract correct arguments from Arabic text
4. Handle parallel and complex function calls
5. Detect when no function should be called
## Models Evaluated
- **Arabic-Native**: Jais, ALLaM, SILMA, AceGPT
- **Multilingual**: Qwen, Llama, Gemma, Mistral, Phi, BLOOMZ, Aya
## Dataset
๐Ÿ“Š **Dataset**: [HeshamHaroon/Arabic_Function_Calling](https://huggingface.co/datasets/HeshamHaroon/Arabic_Function_Calling)
- **1,470 total samples** across 10 categories
- Simple, Multiple, Parallel, Parallel Multiple
- Irrelevance Detection
- Dialect Handling (Egyptian, Gulf, Levantine)
## Evaluation
The leaderboard automatically evaluates models using the HuggingFace Inference API when the Space starts.
## Citation
```bibtex
@misc{afcl2024,
title={Arabic Function Calling Leaderboard},
author={Hesham Haroon},
year={2024},
url={https://huggingface.co/spaces/HeshamHaroon/Arabic-Function-Calling-Leaderboard}
}
```