what is the sampling technique you used and how you packed your dataset. How is multilingual data is handeled, have you used additional special tokens demarking language boundry.

by Sibadatta - opened Sep 10, 2025

Sep 10, 2025

I am suffering from the "curse of multilinguality" i reproducing the steps mentioned in the technical reports of Pragna 1B model.
when I input something in Gujarati, i get output in Hindi. But its related to the Gujarati context.

Sibadatta changed discussion title from what is the sampling technique you used and packed you dataset. How is multilingual data is handeled, have you used additional special tokens demarking language boundry. to what is the sampling technique you used and how you packed your dataset. How is multilingual data is handeled, have you used additional special tokens demarking language boundry. Sep 10, 2025

Sibadatta changed discussion status to closed Sep 10, 2025

Sibadatta changed discussion status to open Sep 10, 2025

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment