what is the sampling technique you used and how you packed your dataset. How is multilingual data is handeled, have you used additional special tokens demarking language boundry.

#1
by Sibadatta - opened

I am suffering from the "curse of multilinguality" i reproducing the steps mentioned in the technical reports of Pragna 1B model.
when I input something in Gujarati, i get output in Hindi. But its related to the Gujarati context.

Sibadatta changed discussion title from what is the sampling technique you used and packed you dataset. How is multilingual data is handeled, have you used additional special tokens demarking language boundry. to what is the sampling technique you used and how you packed your dataset. How is multilingual data is handeled, have you used additional special tokens demarking language boundry.
Sibadatta changed discussion status to closed
Sibadatta changed discussion status to open

Sign up or log in to comment