File size: 3,189 Bytes
5b9b3f5 ca30726 5b9b3f5 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 | ---
license: mit
language:
- ar
tags:
- dependency-parsing
- arabic
- dialects
---
# CamelParser-Dialects
**CamelParser-Dialects** is a neural dependency parsing model for **dialectal Arabic** and Modern Standard Arabic (MSA), designed under the **CATiB dependency formalism**.
It is based on the **biaffine attention parser** architecture introduced by Dozat and Manning (2017), implemented using [SuPar](https://github.com/yzhangcs/parser).
The model leverages **CamelBERT-MIX**, a pretrained language model trained on a large and diverse Arabic corpus.
Full details are available in our paper:
**"Parsing Arabic Dialects Revisited: New Benchmarks, Models, and Insights"**
---
## π Model Variants and LAS (Labeled Attachment Score) on TEST
||Checkpoint|Training Data|MSA|EGY|GLF|AVG|
|:-----:|:-----|:--------|:--------|:--------|:--------|:--------|
||`CAMeL-Lab/camelparser-dialects-MSA`|CamelTB, PATB|87.3|73.0|73.3|77.9|
||`CAMeL-Lab/camelparser-dialects-EGY`|ARZTB|79.2|83.9|68.7|77.3|
||`CAMeL-Lab/camelparser-dialects-GLF`|CamelTB-Gumar|65.4|58.7|73.8|66.0|
||`CAMeL-Lab/camelparser-dialects-MSA-EGY`|CamelTB, PATB, ARZTB|87.1|84.4|70.1|79.8|
||`CAMeL-Lab/camelparser-dialects-MSA-GLF`|CamelTB, PATB, CamelTB-Gumar|87.2|74.4|81.0|80.9|
|βοΈ|`CAMeL-Lab/camelparser-dialects-EGY-GLF`|ARZTB, CamelTB-Gumar|80.0|83.8|79.4|81.1|
||`CAMeL-Lab/camelparser-dialects-MSA-EGY-GLF`|CamelTB, PATB, ARZTB, CamelTB-Gumar|87.2|84.2|80.3|83.9|
The recommended checkpoint is the **all-variety model (`MSA-EGY-GLF`)**, which provides the best overall cross-dialect performance.
---
## π§ Model Architecture
- **Encoder**: CamelBERT-MIX
- **Parser**: Deep biaffine attention (Dozat & Manning, 2017)
- **Framework**: [SuPar](https://github.com/yzhangcs/parser)
- **Formalism**: CATiB dependency scheme
---
## π Training Data
The models are trained on combinations of the following treebanks:
- **CamelTB** (MSA): [camel_treebank_1.1.zip](https://sites.google.com/nyu.edu/camel-treebank/resources)
- **PATB** (Penn Arabic Treebank): [LDC2010T13](https://catalog.ldc.upenn.edu/LDC2010T13), [LDC2011T09](https://catalog.ldc.upenn.edu/LDC2011T09), [LDC2010T08](https://catalog.ldc.upenn.edu/LDC2010T08)
- **ARZTB** (Egyptian Arabic Treebank): [LDC2018T23](https://catalog.ldc.upenn.edu/LDC2018T23)
- **CamelTB-Gumar** (Gulf Arabic): [`CamelTB-Gumar.1.0.zip`](https://forms.gle/54WSUt7Z9m9vk6p69)
---
## π Intended Use
This model is intended for:
- Dependency parsing of Arabic text
- Linguistic analysis of dialectal Arabic
---
## π§ Usage
For usage instructions and code, please refer to the official repository:
π https://github.com/CAMeL-Lab/camel_parser_dialects
## π Citation
If you use this model, please cite:
```bibtex
@inproceedings{Elshabrawy:2026:camelparser-dialects,
title = "{Parsing Arabic Dialects Revisited: New Benchmarks, Models, and Insights}",
author = {Ahmed Elshabrawy and
Go Inoue and
Muhammed AbuOdeh and
Nizar Habash} ,
booktitle = {Proceedings of The 7th Workshop on Open-Source Arabic Corpora and Processing Tools (OSACT)},
year = "2026",
address = "Palma, Spain"
}
``` |