Sanadshabann commited on
Commit
e5d4849
·
verified ·
1 Parent(s): 803bf75

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +17 -9
README.md CHANGED
@@ -112,13 +112,21 @@ arXiv:2508.17347 (2025). *Accepted to EMNLP 2025 Main Conference.*
112
 
113
  ### BibTeX
114
  ```bibtex
115
- @misc{shaban2025ags,
116
- title = {The Arabic Generality Score: Another Dimension of Modeling Arabic Dialectness},
117
- author = {Shaban, Sanad and Habash, Nizar},
118
- year = {2025},
119
- eprint = {2508.17347},
120
- archivePrefix = {arXiv},
121
- primaryClass = {cs.CL},
122
- note = {Accepted to EMNLP 2025 Main Conference},
123
- url = {https://arxiv.org/abs/2508.17347}
 
 
 
 
 
 
 
 
124
  }
 
112
 
113
  ### BibTeX
114
  ```bibtex
115
+ @inproceedings{shaban-habash-2025-arabic,
116
+ title = "The {A}rabic Generality Score: Another Dimension of Modeling {A}rabic Dialectness",
117
+ author = "Sha{'}ban, Sanad and
118
+ Habash, Nizar",
119
+ editor = "Christodoulopoulos, Christos and
120
+ Chakraborty, Tanmoy and
121
+ Rose, Carolyn and
122
+ Peng, Violet",
123
+ booktitle = "Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing",
124
+ month = nov,
125
+ year = "2025",
126
+ address = "Suzhou, China",
127
+ publisher = "Association for Computational Linguistics",
128
+ url = "https://aclanthology.org/2025.emnlp-main.1524/",
129
+ pages = "29990--30001",
130
+ ISBN = "979-8-89176-332-6",
131
+ abstract = "Arabic dialects form a diverse continuum, yet NLP models often treat them as discrete categories. Recent work addresses this issue by modeling dialectness as a continuous variable, notably through the Arabic Level of Dialectness (ALDi). However, ALDi reduces complex variation to a single dimension. We propose a complementary measure: the Arabic Generality Score (AGS), which quantifies how widely a word is used across dialects. We introduce a pipeline that combines word alignment, etymology-aware edit distance, and smoothing to annotate a parallel corpus with word-level AGS. A regression model is then trained to predict AGS in context. Our approach outperforms strong baselines, including state-of-the-art dialect ID systems, on a multi-dialect benchmark. AGS offers a scalable, linguistically grounded way to model lexical generality, enriching representations of Arabic dialectness. Code is publicly available at https://github.com/CAMeL-Lab/arabic-generality-score."
132
  }