mT5-small for Nepali Punctuation Restoration

This is a mT5-small model fine-tuned for the task of punctuation restoration. This has been used in our Nepali to English translation pipeline to restore punctuation on the outputs of wav-2-vec-2-xlsr-300-m.

This model is specifically trained to process raw, unsegmented Nepali text (no punctuations, no whitespaces), and output well-formed text with proper spacing and punctuation.

Model Details 🔧

Base model: google/mt5-small Fine-tuned for: Punctuation and whitespace restoration in Nepali Input format: Raw Nepali text (no spaces, no punctuation) Output format: Nepali text with spaces and punctuation.

Example

Input (No spaces, no punctuation):

मेरोनामअर्जुनहोममकाठमाडौंबसेकोछु

Output (Structured and punctuated):

मेरो नाम अर्जुन हो। म काठमाडौं बसेको छु।

Training

Training Data: sharad461/ne-en-parallel-208k, all the punctuations and whitespaces were removed from nepali text in this dataset. And, their original punctuated versions were used as targets.

Downloads last month: 1

Safetensors

Model size

0.3B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for iamTangsang/nepali-punctuation-restoration-mt5-final

Base model

google/mt5-small

Finetuned

(667)

this model

iamTangsang
/

nepali-punctuation-restoration-mt5-final

mT5-small for Nepali Punctuation Restoration

Model Details 🔧

Example

Training

Model tree for iamTangsang/nepali-punctuation-restoration-mt5-final

Datasets used to train iamTangsang/nepali-punctuation-restoration-mt5-final