mT5-small ITN Ukrainian

Inverse Text Normalization model for Ukrainian, fine-tuned from google/mt5-small.

For fast CPU inference, use the CTranslate2 version: Mikhailo/mT5-small-ITN-uk-ct2

Usage

from transformers import MT5ForConditionalGeneration, AutoTokenizer
import torch

tok = AutoTokenizer.from_pretrained("Mikhailo/mT5-small-ITN-uk")
model = MT5ForConditionalGeneration.from_pretrained("Mikhailo/mT5-small-ITN-uk").eval()

def itn(text):
    inp = tok(text, return_tensors="pt")
    with torch.no_grad():
        out = model.generate(**inp, max_new_tokens=140, num_beams=4)
    return tok.decode(out[0], skip_special_tokens=True)

print(itn("сорок два відсотки населення"))   # → 42% населення
print(itn("пʼятнадцять доларів і двадцять центів"))  # → $15.20

Training

  • Base: google/mt5-small
  • Dataset: Mikhailo/ubertext-itn-uk (~889k pairs)
  • Steps: 50000, batch 48, lr 5e-4, bf16
Downloads last month
77
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support