go-inoue commited on
Commit
c746ed0
Β·
verified Β·
1 Parent(s): 4e1dbfb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +84 -3
README.md CHANGED
@@ -1,3 +1,84 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - ar
5
+ tags:
6
+ - dependency-parsing
7
+ - arabic
8
+ - dialects
9
+ ---
10
+ # CamelParser-Dialects
11
+ **CamelParser-Dialects** is a neural dependency parsing model for **dialectal Arabic** and Modern Standard Arabic (MSA), designed under the **CATiB dependency formalism**.
12
+
13
+ It is based on the **biaffine attention parser** architecture introduced by Dozat and Manning (2017), implemented using [SuPar](https://github.com/yzhangcs/parser).
14
+ The model leverages **CamelBERT-MIX**, a pretrained language model trained on a large and diverse Arabic corpus.
15
+
16
+ Full details are available in our paper:
17
+ **"Parsing Arabic Dialects Revisited: New Benchmarks, Models, and Insights"**
18
+
19
+ ---
20
+
21
+ ## πŸ“Š Model Variants and LAS (Labeled Attachment Score) on TEST
22
+ ||Checkpoint|Training Data|MSA|EGY|GLF|AVG|
23
+ |:-----:|:-----|:--------|:--------|:--------|:--------|:--------|
24
+ ||`CAMeL-Lab/camelparser-dialects-MSA`|CamelTB, PATB|87.3|73.0|73.3|77.9|
25
+ ||`CAMeL-Lab/camelparser-dialects-EGY`|ARZTB|79.2|83.9|68.7|77.3|
26
+ ||`CAMeL-Lab/camelparser-dialects-GLF`|CamelTB-Gumar|65.4|58.7|73.8|66.0|
27
+ ||`CAMeL-Lab/camelparser-dialects-MSA-EGY`|CamelTB, PATB, ARZTB|87.1|84.4|70.1|79.8|
28
+ |β˜‘οΈ|`CAMeL-Lab/camelparser-dialects-MSA-GLF`|CamelTB, PATB, CamelTB-Gumar|87.2|74.4|81.0|80.9|
29
+ ||`CAMeL-Lab/camelparser-dialects-EGY-GLF`|ARZTB, CamelTB-Gumar|80.0|83.8|79.4|81.1|
30
+ ||`CAMeL-Lab/camelparser-dialects-MSA-EGY-GLF`|CamelTB, PATB, ARZTB, CamelTB-Gumar|87.2|84.2|80.3|83.9|
31
+
32
+ The recommended checkpoint is the **all-variety model (`MSA-EGY-GLF`)**, which provides the best overall cross-dialect performance.
33
+
34
+ ---
35
+
36
+ ## 🧠 Model Architecture
37
+
38
+ - **Encoder**: CamelBERT-MIX
39
+ - **Parser**: Deep biaffine attention (Dozat & Manning, 2017)
40
+ - **Framework**: [SuPar](https://github.com/yzhangcs/parser)
41
+ - **Formalism**: CATiB dependency scheme
42
+
43
+ ---
44
+
45
+ ## πŸ“š Training Data
46
+
47
+ The models are trained on combinations of the following treebanks:
48
+
49
+ - **CamelTB** (MSA): [camel_treebank_1.1.zip](https://sites.google.com/nyu.edu/camel-treebank/resources)
50
+ - **PATB** (Penn Arabic Treebank): [LDC2010T13](https://catalog.ldc.upenn.edu/LDC2010T13), [LDC2011T09](https://catalog.ldc.upenn.edu/LDC2011T09), [LDC2010T08](https://catalog.ldc.upenn.edu/LDC2010T08)
51
+ - **ARZTB** (Egyptian Arabic Treebank): [LDC2018T23](https://catalog.ldc.upenn.edu/LDC2018T23)
52
+ - **CamelTB-Gumar** (Gulf Arabic): [`CamelTB-Gumar.1.0.zip`](https://forms.gle/54WSUt7Z9m9vk6p69)
53
+
54
+ ---
55
+
56
+ ## πŸš€ Intended Use
57
+
58
+ This model is intended for:
59
+
60
+ - Dependency parsing of Arabic text
61
+ - Linguistic analysis of dialectal Arabic
62
+
63
+ ---
64
+
65
+ ## πŸ”§ Usage
66
+
67
+ For usage instructions and code, please refer to the official repository:
68
+
69
+ πŸ‘‰ https://github.com/CAMeL-Lab/camel_parser_dialects
70
+
71
+ ## πŸ“– Citation
72
+ If you use this model, please cite:
73
+
74
+ ```bibtex
75
+ @inproceedings{Elshabrawy:2026:camelparser-dialects,
76
+ title = "{Parsing Arabic Dialects Revisited: New Benchmarks, Models, and Insights}",
77
+ author = {Ahmed Elshabrawy and
78
+ Go Inoue and
79
+ Muhammed AbuOdeh and
80
+ Nizar Habash} ,
81
+ booktitle = {Proceedings of The First Arabic Natural Language Processing Conference (ArabicNLP 2023)},
82
+ year = "2026"
83
+ }
84
+ ```