mT5-small ITN Ukrainian
Inverse Text Normalization model for Ukrainian, fine-tuned from google/mt5-small.
For fast CPU inference, use the CTranslate2 version: Mikhailo/mT5-small-ITN-uk-ct2
Usage
from transformers import MT5ForConditionalGeneration, AutoTokenizer
import torch
tok = AutoTokenizer.from_pretrained("Mikhailo/mT5-small-ITN-uk")
model = MT5ForConditionalGeneration.from_pretrained("Mikhailo/mT5-small-ITN-uk").eval()
def itn(text):
inp = tok(text, return_tensors="pt")
with torch.no_grad():
out = model.generate(**inp, max_new_tokens=140, num_beams=4)
return tok.decode(out[0], skip_special_tokens=True)
print(itn("сорок два відсотки населення")) # → 42% населення
print(itn("пʼятнадцять доларів і двадцять центів")) # → $15.20
Training
- Base:
google/mt5-small - Dataset:
Mikhailo/ubertext-itn-uk(~889k pairs) - Steps: 50000, batch 48, lr 5e-4, bf16
- Downloads last month
- 77
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support