fadi77 commited on
Commit
dcbf85b
·
verified ·
1 Parent(s): a556fa8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +20 -1
README.md CHANGED
@@ -85,4 +85,23 @@ For examples on how these models can be used in code, take a look at: https://gi
85
 
86
  The models are trained on Wikipedia data, which may not represent all varieties of Arabic equally. The diacritization process, while state-of-the-art, may introduce some errors or biases in the training data.
87
 
88
- The subword tokenization approach used in the mlm_p2g_non_diacritics model has limitations for phonemic modeling as noted above.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
85
 
86
  The models are trained on Wikipedia data, which may not represent all varieties of Arabic equally. The diacritization process, while state-of-the-art, may introduce some errors or biases in the training data.
87
 
88
+ The subword tokenization approach used in the mlm_p2g_non_diacritics model has limitations for phonemic modeling as noted above.
89
+
90
+ ## Citation
91
+
92
+ **BibTeX:**
93
+ ```bibtex
94
+ @article{catt2024,
95
+ title={CATT: Character-based Arabic Tashkeel Transformer},
96
+ author={Alasmary, Faris and Zaafarani, Orjuwan and Ghannam, Ahmad},
97
+ journal={arXiv preprint arXiv:2407.03236},
98
+ year={2024}
99
+ }
100
+
101
+ @article{plbert2023,
102
+ title={Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme Predictions},
103
+ author={Li, Yinghao Aaron and Han, Cong and Jiang, Xilin and Mesgarani, Nima},
104
+ journal={arXiv preprint arXiv:2301.08810},
105
+ year={2023}
106
+ }
107
+ ```