fadi77
/

pl-bert

fadi77 commited on Apr 19, 2025

Commit

dcbf85b

verified ·

1 Parent(s): a556fa8

Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -85,4 +85,23 @@ For examples on how these models can be used in code, take a look at: https://gi
 The models are trained on Wikipedia data, which may not represent all varieties of Arabic equally. The diacritization process, while state-of-the-art, may introduce some errors or biases in the training data.
-The subword tokenization approach used in the mlm_p2g_non_diacritics model has limitations for phonemic modeling as noted above.

 The models are trained on Wikipedia data, which may not represent all varieties of Arabic equally. The diacritization process, while state-of-the-art, may introduce some errors or biases in the training data.
+The subword tokenization approach used in the mlm_p2g_non_diacritics model has limitations for phonemic modeling as noted above.
+## Citation
+**BibTeX:**
+```bibtex
+@article{catt2024,
+  title={CATT: Character-based Arabic Tashkeel Transformer},
+  author={Alasmary, Faris and Zaafarani, Orjuwan and Ghannam, Ahmad},
+  journal={arXiv preprint arXiv:2407.03236},
+  year={2024}
+}
+@article{plbert2023,
+  title={Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme Predictions},
+  author={Li, Yinghao Aaron and Han, Cong and Jiang, Xilin and Mesgarani, Nima},
+  journal={arXiv preprint arXiv:2301.08810},
+  year={2023}
+}
+```