bytesizedllm
/

TamilXLM_Roberta

Model card Files Files and versions

bytesizedllm commited on Feb 4, 2025

Commit

0a8f61a

·

verified ·

1 Parent(s): d0c95b1

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -1,4 +1,4 @@
-For this study, we fine-tuned the base version of XLM-RoBERTa using Masked Language Modeling (MLM) to adapt it for handling transliteration and code-switching in Tamil-English dataset. The MLM task involves randomly masking a subset of input tokens and training the model to predict these masked tokens based on their context, allowing the model to learn enriched contextual embeddings tailored to the linguistic challenges of bilingual text.
 To adapt XLM-RoBERTa effectively, the MLM training dataset was constructed from three key components:
 1. Original data: Contains monolingual text from Tamil and Malayalam social media sources.

+We fine-tuned the base version of XLM-RoBERTa using Masked Language Modeling (MLM) to adapt it for handling transliteration and code-switching in Tamil-English dataset. The MLM task involves randomly masking a subset of input tokens and training the model to predict these masked tokens based on their context, allowing the model to learn enriched contextual embeddings tailored to the linguistic challenges of bilingual text.
 To adapt XLM-RoBERTa effectively, the MLM training dataset was constructed from three key components:
 1. Original data: Contains monolingual text from Tamil and Malayalam social media sources.