updated related work
Browse files
README.md
CHANGED
|
@@ -120,7 +120,12 @@ print(tokenizer.encode(text))
|
|
| 120 |
- M. Velayuthan and K. Sarveswaran, “Egalitarian Language Representation in Language Models: It All Begins with Tokenizers,” COLING 2025.
|
| 121 |
arXiv:2409.11501 [cs.CL]. DOI: https://doi.org/10.48550/arXiv.2409.11501
|
| 122 |
|
|
|
|
|
|
|
| 123 |
- M. K. H. and A. Giri, “Orthographic Syllable Pair Encoding for Language Modelling Tasks in Indic Languages,” in 2023 IEEE MIT Undergraduate Research Technology Conference (URTC), Cambridge, MA, USA, pp. 1–6, 2023.
|
| 124 |
DOI: https://doi.org/10.1109/URTC60662.2023.10534970
|
| 125 |
|
|
|
|
|
|
|
|
|
|
| 126 |
|
|
|
|
| 120 |
- M. Velayuthan and K. Sarveswaran, “Egalitarian Language Representation in Language Models: It All Begins with Tokenizers,” COLING 2025.
|
| 121 |
arXiv:2409.11501 [cs.CL]. DOI: https://doi.org/10.48550/arXiv.2409.11501
|
| 122 |
|
| 123 |
+
- Unicode Normalization and Grapheme Parsing of Indic Languages, 2023. [Online]. Available: https://arxiv.org/abs/2306.01743
|
| 124 |
+
|
| 125 |
- M. K. H. and A. Giri, “Orthographic Syllable Pair Encoding for Language Modelling Tasks in Indic Languages,” in 2023 IEEE MIT Undergraduate Research Technology Conference (URTC), Cambridge, MA, USA, pp. 1–6, 2023.
|
| 126 |
DOI: https://doi.org/10.1109/URTC60662.2023.10534970
|
| 127 |
|
| 128 |
+
- R. Sennrich, B. Haddow, and A. Birch, “Neural machine translation of rare words with subword units,” in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL), 2016.
|
| 129 |
+
[Online]. Available: https://arxiv.org/abs/1508.07909
|
| 130 |
+
|
| 131 |
|