File size: 3,189 Bytes
5b9b3f5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ca30726
 
 
5b9b3f5
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
---
license: mit
language:
- ar
tags:
- dependency-parsing
- arabic
- dialects
---
# CamelParser-Dialects
**CamelParser-Dialects** is a neural dependency parsing model for **dialectal Arabic** and Modern Standard Arabic (MSA), designed under the **CATiB dependency formalism**.

It is based on the **biaffine attention parser** architecture introduced by Dozat and Manning (2017), implemented using [SuPar](https://github.com/yzhangcs/parser).
The model leverages **CamelBERT-MIX**, a pretrained language model trained on a large and diverse Arabic corpus.

Full details are available in our paper: 
**"Parsing Arabic Dialects Revisited: New Benchmarks, Models, and Insights"**

---

## πŸ“Š Model Variants and LAS (Labeled Attachment Score) on TEST
||Checkpoint|Training Data|MSA|EGY|GLF|AVG|
|:-----:|:-----|:--------|:--------|:--------|:--------|:--------|
||`CAMeL-Lab/camelparser-dialects-MSA`|CamelTB, PATB|87.3|73.0|73.3|77.9|
||`CAMeL-Lab/camelparser-dialects-EGY`|ARZTB|79.2|83.9|68.7|77.3|
||`CAMeL-Lab/camelparser-dialects-GLF`|CamelTB-Gumar|65.4|58.7|73.8|66.0|
||`CAMeL-Lab/camelparser-dialects-MSA-EGY`|CamelTB, PATB, ARZTB|87.1|84.4|70.1|79.8|
||`CAMeL-Lab/camelparser-dialects-MSA-GLF`|CamelTB, PATB, CamelTB-Gumar|87.2|74.4|81.0|80.9|
|β˜‘οΈ|`CAMeL-Lab/camelparser-dialects-EGY-GLF`|ARZTB, CamelTB-Gumar|80.0|83.8|79.4|81.1|
||`CAMeL-Lab/camelparser-dialects-MSA-EGY-GLF`|CamelTB, PATB, ARZTB, CamelTB-Gumar|87.2|84.2|80.3|83.9|

The recommended checkpoint is the **all-variety model (`MSA-EGY-GLF`)**, which provides the best overall cross-dialect performance.

---

## 🧠 Model Architecture

- **Encoder**: CamelBERT-MIX
- **Parser**: Deep biaffine attention (Dozat & Manning, 2017)
- **Framework**: [SuPar](https://github.com/yzhangcs/parser)
- **Formalism**: CATiB dependency scheme

---

## πŸ“š Training Data

The models are trained on combinations of the following treebanks:

- **CamelTB** (MSA): [camel_treebank_1.1.zip](https://sites.google.com/nyu.edu/camel-treebank/resources)
- **PATB** (Penn Arabic Treebank): [LDC2010T13](https://catalog.ldc.upenn.edu/LDC2010T13), [LDC2011T09](https://catalog.ldc.upenn.edu/LDC2011T09), [LDC2010T08](https://catalog.ldc.upenn.edu/LDC2010T08)
- **ARZTB** (Egyptian Arabic Treebank): [LDC2018T23](https://catalog.ldc.upenn.edu/LDC2018T23)
- **CamelTB-Gumar** (Gulf Arabic): [`CamelTB-Gumar.1.0.zip`](https://forms.gle/54WSUt7Z9m9vk6p69)

---

## πŸš€ Intended Use

This model is intended for:

- Dependency parsing of Arabic text
- Linguistic analysis of dialectal Arabic

---

## πŸ”§ Usage

For usage instructions and code, please refer to the official repository:

πŸ‘‰ https://github.com/CAMeL-Lab/camel_parser_dialects

## πŸ“– Citation
If you use this model, please cite:

```bibtex
@inproceedings{Elshabrawy:2026:camelparser-dialects,
    title = "{Parsing Arabic Dialects Revisited: New Benchmarks, Models, and Insights}",
    author = {Ahmed Elshabrawy and
              Go Inoue and
              Muhammed AbuOdeh and
              Nizar Habash} ,
    booktitle = {Proceedings of The 7th Workshop on Open-Source Arabic Corpora and Processing Tools (OSACT)},
    year = "2026",
    address = "Palma, Spain"
}
```