mT5-small for Nepali Punctuation Restoration
This is a mT5-small model fine-tuned for the task of punctuation restoration. This has been used in our Nepali to English translation pipeline to restore punctuation on the outputs of wav-2-vec-2-xlsr-300-m.
This model is specifically trained to process raw, unsegmented Nepali text (no punctuations, no whitespaces), and output well-formed text with proper spacing and punctuation.
Model Details 🔧
Base model: google/mt5-small
Fine-tuned for: Punctuation and whitespace restoration in Nepali
Input format: Raw Nepali text (no spaces, no punctuation)
Output format: Nepali text with spaces and punctuation.
Example
Input (No spaces, no punctuation):
मेरोनामअर्जुनहोममकाठमाडौंबसेकोछु
Output (Structured and punctuated):
मेरो नाम अर्जुन हो। म काठमाडौं बसेको छु।
Training
- Training Data:
sharad461/ne-en-parallel-208k, all the punctuations and whitespaces were removed from nepali text in this dataset. And, their original punctuated versions were used as targets.
- Downloads last month
- 1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for iamTangsang/nepali-punctuation-restoration-mt5-final
Base model
google/mt5-small