bytesizedllm commited on
Commit
e1766ee
·
verified ·
1 Parent(s): 06c4362

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -1,7 +1,7 @@
1
  For this study, we fine-tuned the base version of XLM-RoBERTa using Masked Language Modeling (MLM) to adapt it for handling transliteration and code-switching in Malayalam-English dataset. The MLM task involves randomly masking a subset of input tokens and training the model to predict these masked tokens based on their context, allowing the model to learn enriched contextual embeddings tailored to the linguistic challenges of bilingual text.
2
 
3
  To adapt XLM-RoBERTa effectively, the MLM training dataset was constructed from three key components:
4
- 1. Original data: Contains monolingual text from Malayalam social media sources.
5
 
6
  2. Fully transliterated data: All words in the original data were transliterated into Roman script.
7
 
 
1
  For this study, we fine-tuned the base version of XLM-RoBERTa using Masked Language Modeling (MLM) to adapt it for handling transliteration and code-switching in Malayalam-English dataset. The MLM task involves randomly masking a subset of input tokens and training the model to predict these masked tokens based on their context, allowing the model to learn enriched contextual embeddings tailored to the linguistic challenges of bilingual text.
2
 
3
  To adapt XLM-RoBERTa effectively, the MLM training dataset was constructed from three key components:
4
+ 1. Original data: Contains monolingual text from Malayalam AI4Bharath.
5
 
6
  2. Fully transliterated data: All words in the original data were transliterated into Roman script.
7