Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,84 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: mit
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
language:
|
| 4 |
+
- ar
|
| 5 |
+
tags:
|
| 6 |
+
- dependency-parsing
|
| 7 |
+
- arabic
|
| 8 |
+
- dialects
|
| 9 |
+
---
|
| 10 |
+
# CamelParser-Dialects
|
| 11 |
+
**CamelParser-Dialects** is a neural dependency parsing model for **dialectal Arabic** and Modern Standard Arabic (MSA), designed under the **CATiB dependency formalism**.
|
| 12 |
+
|
| 13 |
+
It is based on the **biaffine attention parser** architecture introduced by Dozat and Manning (2017), implemented using [SuPar](https://github.com/yzhangcs/parser).
|
| 14 |
+
The model leverages **CamelBERT-MIX**, a pretrained language model trained on a large and diverse Arabic corpus.
|
| 15 |
+
|
| 16 |
+
Full details are available in our paper:
|
| 17 |
+
**"Parsing Arabic Dialects Revisited: New Benchmarks, Models, and Insights"**
|
| 18 |
+
|
| 19 |
+
---
|
| 20 |
+
|
| 21 |
+
## π Model Variants and LAS (Labeled Attachment Score) on TEST
|
| 22 |
+
||Checkpoint|Training Data|MSA|EGY|GLF|AVG|
|
| 23 |
+
|:-----:|:-----|:--------|:--------|:--------|:--------|:--------|
|
| 24 |
+
||`CAMeL-Lab/camelparser-dialects-MSA`|CamelTB, PATB|87.3|73.0|73.3|77.9|
|
| 25 |
+
||`CAMeL-Lab/camelparser-dialects-EGY`|ARZTB|79.2|83.9|68.7|77.3|
|
| 26 |
+
||`CAMeL-Lab/camelparser-dialects-GLF`|CamelTB-Gumar|65.4|58.7|73.8|66.0|
|
| 27 |
+
||`CAMeL-Lab/camelparser-dialects-MSA-EGY`|CamelTB, PATB, ARZTB|87.1|84.4|70.1|79.8|
|
| 28 |
+
||`CAMeL-Lab/camelparser-dialects-MSA-GLF`|CamelTB, PATB, CamelTB-Gumar|87.2|74.4|81.0|80.9|
|
| 29 |
+
|βοΈ|`CAMeL-Lab/camelparser-dialects-EGY-GLF`|ARZTB, CamelTB-Gumar|80.0|83.8|79.4|81.1|
|
| 30 |
+
||`CAMeL-Lab/camelparser-dialects-MSA-EGY-GLF`|CamelTB, PATB, ARZTB, CamelTB-Gumar|87.2|84.2|80.3|83.9|
|
| 31 |
+
|
| 32 |
+
The recommended checkpoint is the **all-variety model (`MSA-EGY-GLF`)**, which provides the best overall cross-dialect performance.
|
| 33 |
+
|
| 34 |
+
---
|
| 35 |
+
|
| 36 |
+
## π§ Model Architecture
|
| 37 |
+
|
| 38 |
+
- **Encoder**: CamelBERT-MIX
|
| 39 |
+
- **Parser**: Deep biaffine attention (Dozat & Manning, 2017)
|
| 40 |
+
- **Framework**: [SuPar](https://github.com/yzhangcs/parser)
|
| 41 |
+
- **Formalism**: CATiB dependency scheme
|
| 42 |
+
|
| 43 |
+
---
|
| 44 |
+
|
| 45 |
+
## π Training Data
|
| 46 |
+
|
| 47 |
+
The models are trained on combinations of the following treebanks:
|
| 48 |
+
|
| 49 |
+
- **CamelTB** (MSA): [camel_treebank_1.1.zip](https://sites.google.com/nyu.edu/camel-treebank/resources)
|
| 50 |
+
- **PATB** (Penn Arabic Treebank): [LDC2010T13](https://catalog.ldc.upenn.edu/LDC2010T13), [LDC2011T09](https://catalog.ldc.upenn.edu/LDC2011T09), [LDC2010T08](https://catalog.ldc.upenn.edu/LDC2010T08)
|
| 51 |
+
- **ARZTB** (Egyptian Arabic Treebank): [LDC2018T23](https://catalog.ldc.upenn.edu/LDC2018T23)
|
| 52 |
+
- **CamelTB-Gumar** (Gulf Arabic): [`CamelTB-Gumar.1.0.zip`](https://forms.gle/54WSUt7Z9m9vk6p69)
|
| 53 |
+
|
| 54 |
+
---
|
| 55 |
+
|
| 56 |
+
## π Intended Use
|
| 57 |
+
|
| 58 |
+
This model is intended for:
|
| 59 |
+
|
| 60 |
+
- Dependency parsing of Arabic text
|
| 61 |
+
- Linguistic analysis of dialectal Arabic
|
| 62 |
+
|
| 63 |
+
---
|
| 64 |
+
|
| 65 |
+
## π§ Usage
|
| 66 |
+
|
| 67 |
+
For usage instructions and code, please refer to the official repository:
|
| 68 |
+
|
| 69 |
+
π https://github.com/CAMeL-Lab/camel_parser_dialects
|
| 70 |
+
|
| 71 |
+
## π Citation
|
| 72 |
+
If you use this model, please cite:
|
| 73 |
+
|
| 74 |
+
```bibtex
|
| 75 |
+
@inproceedings{Elshabrawy:2026:camelparser-dialects,
|
| 76 |
+
title = "{Parsing Arabic Dialects Revisited: New Benchmarks, Models, and Insights}",
|
| 77 |
+
author = {Ahmed Elshabrawy and
|
| 78 |
+
Go Inoue and
|
| 79 |
+
Muhammed AbuOdeh and
|
| 80 |
+
Nizar Habash} ,
|
| 81 |
+
booktitle = {Proceedings of The First Arabic Natural Language Processing Conference (ArabicNLP 2023)},
|
| 82 |
+
year = "2026"
|
| 83 |
+
}
|
| 84 |
+
```
|