|
|
--- |
|
|
title: Arabic Function Calling Leaderboard |
|
|
emoji: ๐ |
|
|
colorFrom: green |
|
|
colorTo: blue |
|
|
sdk: gradio |
|
|
sdk_version: 4.44.0 |
|
|
app_file: app.py |
|
|
pinned: true |
|
|
license: apache-2.0 |
|
|
tags: |
|
|
- arabic |
|
|
- function-calling |
|
|
- leaderboard |
|
|
- llm-evaluation |
|
|
--- |
|
|
|
|
|
# ๐ Arabic Function Calling Leaderboard |
|
|
|
|
|
ููุญุฉ ุชูููู
ุงุณุชุฏุนุงุก ุงูุฏูุงู ุจุงูุนุฑุจูุฉ |
|
|
|
|
|
## Overview |
|
|
|
|
|
The **Arabic Function Calling Leaderboard (AFCL)** evaluates Large Language Models on their ability to: |
|
|
|
|
|
1. Understand Arabic queries (MSA + Dialects) |
|
|
2. Select appropriate functions from available options |
|
|
3. Extract correct arguments from Arabic text |
|
|
4. Handle parallel and complex function calls |
|
|
5. Detect when no function should be called |
|
|
|
|
|
## Models Evaluated |
|
|
|
|
|
- **Arabic-Native**: Jais, ALLaM, SILMA, AceGPT |
|
|
- **Multilingual**: Qwen, Llama, Gemma, Mistral, Phi, BLOOMZ, Aya |
|
|
|
|
|
## Dataset |
|
|
|
|
|
๐ **Dataset**: [HeshamHaroon/Arabic_Function_Calling](https://huggingface.co/datasets/HeshamHaroon/Arabic_Function_Calling) |
|
|
|
|
|
- **1,470 total samples** across 10 categories |
|
|
- Simple, Multiple, Parallel, Parallel Multiple |
|
|
- Irrelevance Detection |
|
|
- Dialect Handling (Egyptian, Gulf, Levantine) |
|
|
|
|
|
## Evaluation |
|
|
|
|
|
The leaderboard automatically evaluates models using the HuggingFace Inference API when the Space starts. |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@misc{afcl2024, |
|
|
title={Arabic Function Calling Leaderboard}, |
|
|
author={Hesham Haroon}, |
|
|
year={2024}, |
|
|
url={https://huggingface.co/spaces/HeshamHaroon/Arabic-Function-Calling-Leaderboard} |
|
|
} |
|
|
``` |
|
|
|