varuni commited on
Commit
6a71af3
·
verified ·
1 Parent(s): f3149d7

updated related work

Browse files
Files changed (1) hide show
  1. README.md +5 -0
README.md CHANGED
@@ -120,7 +120,12 @@ print(tokenizer.encode(text))
120
  - M. Velayuthan and K. Sarveswaran, “Egalitarian Language Representation in Language Models: It All Begins with Tokenizers,” COLING 2025.
121
  arXiv:2409.11501 [cs.CL]. DOI: https://doi.org/10.48550/arXiv.2409.11501
122
 
 
 
123
  - M. K. H. and A. Giri, “Orthographic Syllable Pair Encoding for Language Modelling Tasks in Indic Languages,” in 2023 IEEE MIT Undergraduate Research Technology Conference (URTC), Cambridge, MA, USA, pp. 1–6, 2023.
124
  DOI: https://doi.org/10.1109/URTC60662.2023.10534970
125
 
 
 
 
126
 
 
120
  - M. Velayuthan and K. Sarveswaran, “Egalitarian Language Representation in Language Models: It All Begins with Tokenizers,” COLING 2025.
121
  arXiv:2409.11501 [cs.CL]. DOI: https://doi.org/10.48550/arXiv.2409.11501
122
 
123
+ - Unicode Normalization and Grapheme Parsing of Indic Languages, 2023. [Online]. Available: https://arxiv.org/abs/2306.01743
124
+
125
  - M. K. H. and A. Giri, “Orthographic Syllable Pair Encoding for Language Modelling Tasks in Indic Languages,” in 2023 IEEE MIT Undergraduate Research Technology Conference (URTC), Cambridge, MA, USA, pp. 1–6, 2023.
126
  DOI: https://doi.org/10.1109/URTC60662.2023.10534970
127
 
128
+ - R. Sennrich, B. Haddow, and A. Birch, “Neural machine translation of rare words with subword units,” in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL), 2016.
129
+ [Online]. Available: https://arxiv.org/abs/1508.07909
130
+
131