Safetensors
Nepali
mt5

mT5-small for Nepali Punctuation Restoration

This is a mT5-small model fine-tuned for the task of punctuation restoration. This has been used in our Nepali to English translation pipeline to restore punctuation on the outputs of wav-2-vec-2-xlsr-300-m.

This model is specifically trained to process raw, unsegmented Nepali text (no punctuations, no whitespaces), and output well-formed text with proper spacing and punctuation.

Model Details 🔧

Base model: google/mt5-small Fine-tuned for: Punctuation and whitespace restoration in Nepali Input format: Raw Nepali text (no spaces, no punctuation) Output format: Nepali text with spaces and punctuation.

Example

Input (No spaces, no punctuation):

मेरोनामअर्जुनहोममकाठमाडौंबसेकोछु

Output (Structured and punctuated):

मेरो नाम अर्जुन हो। म काठमाडौं बसेको छु।

Training

  • Training Data: sharad461/ne-en-parallel-208k, all the punctuations and whitespaces were removed from nepali text in this dataset. And, their original punctuated versions were used as targets.
Downloads last month
1
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for iamTangsang/nepali-punctuation-restoration-mt5-final

Base model

google/mt5-small
Finetuned
(667)
this model

Datasets used to train iamTangsang/nepali-punctuation-restoration-mt5-final