--- license: mit language: - ar tags: - dependency-parsing - arabic - dialects --- # CamelParser-Dialects **CamelParser-Dialects** is a neural dependency parsing model for **dialectal Arabic** and Modern Standard Arabic (MSA), designed under the **CATiB dependency formalism**. It is based on the **biaffine attention parser** architecture introduced by Dozat and Manning (2017), implemented using [SuPar](https://github.com/yzhangcs/parser). The model leverages **CamelBERT-MIX**, a pretrained language model trained on a large and diverse Arabic corpus. Full details are available in our paper: **"Parsing Arabic Dialects Revisited: New Benchmarks, Models, and Insights"** --- ## 📊 Model Variants and LAS (Labeled Attachment Score) on TEST ||Checkpoint|Training Data|MSA|EGY|GLF|AVG| |:-----:|:-----|:--------|:--------|:--------|:--------|:--------| ||`CAMeL-Lab/camelparser-dialects-MSA`|CamelTB, PATB|87.3|73.0|73.3|77.9| |☑️|`CAMeL-Lab/camelparser-dialects-EGY`|ARZTB|79.2|83.9|68.7|77.3| ||`CAMeL-Lab/camelparser-dialects-GLF`|CamelTB-Gumar|65.4|58.7|73.8|66.0| ||`CAMeL-Lab/camelparser-dialects-MSA-EGY`|CamelTB, PATB, ARZTB|87.1|84.4|70.1|79.8| ||`CAMeL-Lab/camelparser-dialects-MSA-GLF`|CamelTB, PATB, CamelTB-Gumar|87.2|74.4|81.0|80.9| ||`CAMeL-Lab/camelparser-dialects-EGY-GLF`|ARZTB, CamelTB-Gumar|80.0|83.8|79.4|81.1| ||`CAMeL-Lab/camelparser-dialects-MSA-EGY-GLF`|CamelTB, PATB, ARZTB, CamelTB-Gumar|87.2|84.2|80.3|83.9| The recommended checkpoint is the **all-variety model (`MSA-EGY-GLF`)**, which provides the best overall cross-dialect performance. --- ## 🧠 Model Architecture - **Encoder**: CamelBERT-MIX - **Parser**: Deep biaffine attention (Dozat & Manning, 2017) - **Framework**: [SuPar](https://github.com/yzhangcs/parser) - **Formalism**: CATiB dependency scheme --- ## 📚 Training Data The models are trained on combinations of the following treebanks: - **CamelTB** (MSA): [camel_treebank_1.1.zip](https://sites.google.com/nyu.edu/camel-treebank/resources) - **PATB** (Penn Arabic Treebank): [LDC2010T13](https://catalog.ldc.upenn.edu/LDC2010T13), [LDC2011T09](https://catalog.ldc.upenn.edu/LDC2011T09), [LDC2010T08](https://catalog.ldc.upenn.edu/LDC2010T08) - **ARZTB** (Egyptian Arabic Treebank): [LDC2018T23](https://catalog.ldc.upenn.edu/LDC2018T23) - **CamelTB-Gumar** (Gulf Arabic): [`CamelTB-Gumar.1.0.zip`](https://forms.gle/54WSUt7Z9m9vk6p69) --- ## 🚀 Intended Use This model is intended for: - Dependency parsing of Arabic text - Linguistic analysis of dialectal Arabic --- ## 🔧 Usage For usage instructions and code, please refer to the official repository: 👉 https://github.com/CAMeL-Lab/camel_parser_dialects ## 📖 Citation If you use this model, please cite: ```bibtex @inproceedings{Elshabrawy:2026:camelparser-dialects, title = "{Parsing Arabic Dialects Revisited: New Benchmarks, Models, and Insights}", author = {Ahmed Elshabrawy and Go Inoue and Muhammed AbuOdeh and Nizar Habash} , booktitle = {Proceedings of The 7th Workshop on Open-Source Arabic Corpora and Processing Tools (OSACT)}, year = "2026", address = "Palma, Spain" } ```