| --- |
| license: mit |
| language: |
| - ar |
| tags: |
| - dependency-parsing |
| - arabic |
| - dialects |
| --- |
| # CamelParser-Dialects |
| **CamelParser-Dialects** is a neural dependency parsing model for **dialectal Arabic** and Modern Standard Arabic (MSA), designed under the **CATiB dependency formalism**. |
|
|
| It is based on the **biaffine attention parser** architecture introduced by Dozat and Manning (2017), implemented using [SuPar](https://github.com/yzhangcs/parser). |
| The model leverages **CamelBERT-MIX**, a pretrained language model trained on a large and diverse Arabic corpus. |
|
|
| Full details are available in our paper: |
| **"Parsing Arabic Dialects Revisited: New Benchmarks, Models, and Insights"** |
|
|
| --- |
|
|
| ## π Model Variants and LAS (Labeled Attachment Score) on TEST |
| ||Checkpoint|Training Data|MSA|EGY|GLF|AVG| |
| |:-----:|:-----|:--------|:--------|:--------|:--------|:--------| |
| ||`CAMeL-Lab/camelparser-dialects-MSA`|CamelTB, PATB|87.3|73.0|73.3|77.9| |
| ||`CAMeL-Lab/camelparser-dialects-EGY`|ARZTB|79.2|83.9|68.7|77.3| |
| ||`CAMeL-Lab/camelparser-dialects-GLF`|CamelTB-Gumar|65.4|58.7|73.8|66.0| |
| ||`CAMeL-Lab/camelparser-dialects-MSA-EGY`|CamelTB, PATB, ARZTB|87.1|84.4|70.1|79.8| |
| |βοΈ|`CAMeL-Lab/camelparser-dialects-MSA-GLF`|CamelTB, PATB, CamelTB-Gumar|87.2|74.4|81.0|80.9| |
| ||`CAMeL-Lab/camelparser-dialects-EGY-GLF`|ARZTB, CamelTB-Gumar|80.0|83.8|79.4|81.1| |
| ||`CAMeL-Lab/camelparser-dialects-MSA-EGY-GLF`|CamelTB, PATB, ARZTB, CamelTB-Gumar|87.2|84.2|80.3|83.9| |
|
|
| The recommended checkpoint is the **all-variety model (`MSA-EGY-GLF`)**, which provides the best overall cross-dialect performance. |
|
|
| --- |
|
|
| ## π§ Model Architecture |
|
|
| - **Encoder**: CamelBERT-MIX |
| - **Parser**: Deep biaffine attention (Dozat & Manning, 2017) |
| - **Framework**: [SuPar](https://github.com/yzhangcs/parser) |
| - **Formalism**: CATiB dependency scheme |
|
|
| --- |
|
|
| ## π Training Data |
|
|
| The models are trained on combinations of the following treebanks: |
|
|
| - **CamelTB** (MSA): [camel_treebank_1.1.zip](https://sites.google.com/nyu.edu/camel-treebank/resources) |
| - **PATB** (Penn Arabic Treebank): [LDC2010T13](https://catalog.ldc.upenn.edu/LDC2010T13), [LDC2011T09](https://catalog.ldc.upenn.edu/LDC2011T09), [LDC2010T08](https://catalog.ldc.upenn.edu/LDC2010T08) |
| - **ARZTB** (Egyptian Arabic Treebank): [LDC2018T23](https://catalog.ldc.upenn.edu/LDC2018T23) |
| - **CamelTB-Gumar** (Gulf Arabic): [`CamelTB-Gumar.1.0.zip`](https://forms.gle/54WSUt7Z9m9vk6p69) |
|
|
| --- |
|
|
| ## π Intended Use |
|
|
| This model is intended for: |
|
|
| - Dependency parsing of Arabic text |
| - Linguistic analysis of dialectal Arabic |
|
|
| --- |
|
|
| ## π§ Usage |
|
|
| For usage instructions and code, please refer to the official repository: |
|
|
| π https://github.com/CAMeL-Lab/camel_parser_dialects |
|
|
| ## π Citation |
| If you use this model, please cite: |
|
|
| ```bibtex |
| @inproceedings{Elshabrawy:2026:camelparser-dialects, |
| title = "{Parsing Arabic Dialects Revisited: New Benchmarks, Models, and Insights}", |
| author = {Ahmed Elshabrawy and |
| Go Inoue and |
| Muhammed AbuOdeh and |
| Nizar Habash} , |
| booktitle = {Proceedings of The 7th Workshop on Open-Source Arabic Corpora and Processing Tools (OSACT)}, |
| year = "2026", |
| address = "Palma, Spain" |
| } |
| ``` |