Update README.md
Browse files
README.md
CHANGED
|
@@ -85,4 +85,23 @@ For examples on how these models can be used in code, take a look at: https://gi
|
|
| 85 |
|
| 86 |
The models are trained on Wikipedia data, which may not represent all varieties of Arabic equally. The diacritization process, while state-of-the-art, may introduce some errors or biases in the training data.
|
| 87 |
|
| 88 |
-
The subword tokenization approach used in the mlm_p2g_non_diacritics model has limitations for phonemic modeling as noted above.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 85 |
|
| 86 |
The models are trained on Wikipedia data, which may not represent all varieties of Arabic equally. The diacritization process, while state-of-the-art, may introduce some errors or biases in the training data.
|
| 87 |
|
| 88 |
+
The subword tokenization approach used in the mlm_p2g_non_diacritics model has limitations for phonemic modeling as noted above.
|
| 89 |
+
|
| 90 |
+
## Citation
|
| 91 |
+
|
| 92 |
+
**BibTeX:**
|
| 93 |
+
```bibtex
|
| 94 |
+
@article{catt2024,
|
| 95 |
+
title={CATT: Character-based Arabic Tashkeel Transformer},
|
| 96 |
+
author={Alasmary, Faris and Zaafarani, Orjuwan and Ghannam, Ahmad},
|
| 97 |
+
journal={arXiv preprint arXiv:2407.03236},
|
| 98 |
+
year={2024}
|
| 99 |
+
}
|
| 100 |
+
|
| 101 |
+
@article{plbert2023,
|
| 102 |
+
title={Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme Predictions},
|
| 103 |
+
author={Li, Yinghao Aaron and Han, Cong and Jiang, Xilin and Mesgarani, Nima},
|
| 104 |
+
journal={arXiv preprint arXiv:2301.08810},
|
| 105 |
+
year={2023}
|
| 106 |
+
}
|
| 107 |
+
```
|