Sanadshabann
/

AGS

Text Classification

text-embeddings-inference

Model card Files Files and versions

Sanadshabann commited on Nov 7, 2025

Commit

e5d4849

·

verified ·

1 Parent(s): 803bf75

Update README.md

Files changed (1) hide show

README.md +17 -9

README.md CHANGED Viewed

@@ -112,13 +112,21 @@ arXiv:2508.17347 (2025). *Accepted to EMNLP 2025 Main Conference.*
 ### BibTeX
 ```bibtex
-@misc{shaban2025ags,
-  title         = {The Arabic Generality Score: Another Dimension of Modeling Arabic Dialectness},
-  author        = {Shaban, Sanad and Habash, Nizar},
-  year          = {2025},
-  eprint        = {2508.17347},
-  archivePrefix = {arXiv},
-  primaryClass  = {cs.CL},
-  note          = {Accepted to EMNLP 2025 Main Conference},
-  url           = {https://arxiv.org/abs/2508.17347}
 }

 ### BibTeX
 ```bibtex
+@inproceedings{shaban-habash-2025-arabic,
+    title = "The {A}rabic Generality Score: Another Dimension of Modeling {A}rabic Dialectness",
+    author = "Sha{'}ban, Sanad  and
+      Habash, Nizar",
+    editor = "Christodoulopoulos, Christos  and
+      Chakraborty, Tanmoy  and
+      Rose, Carolyn  and
+      Peng, Violet",
+    booktitle = "Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing",
+    month = nov,
+    year = "2025",
+    address = "Suzhou, China",
+    publisher = "Association for Computational Linguistics",
+    url = "https://aclanthology.org/2025.emnlp-main.1524/",
+    pages = "29990--30001",
+    ISBN = "979-8-89176-332-6",
+    abstract = "Arabic dialects form a diverse continuum, yet NLP models often treat them as discrete categories. Recent work addresses this issue by modeling dialectness as a continuous variable, notably through the Arabic Level of Dialectness (ALDi). However, ALDi reduces complex variation to a single dimension. We propose a complementary measure: the Arabic Generality Score (AGS), which quantifies how widely a word is used across dialects. We introduce a pipeline that combines word alignment, etymology-aware edit distance, and smoothing to annotate a parallel corpus with word-level AGS. A regression model is then trained to predict AGS in context. Our approach outperforms strong baselines, including state-of-the-art dialect ID systems, on a multi-dialect benchmark. AGS offers a scalable, linguistically grounded way to model lexical generality, enriching representations of Arabic dialectness. Code is publicly available at https://github.com/CAMeL-Lab/arabic-generality-score."
 }