--- license: cc-by-4.0 datasets: - comma-project/alignement-pairs language: - fr - la base_model: - google/byt5-small pipeline_tag: translation examples: - text: "Scͥbo uobiᷤᷤ ñ pauli ł donati." - text: "Car toutes les lois sont fõ dees cor recon droitu riele pour quoi se les lois ne tont droiture " --- # ByT5-Small for Normalization This models allows for normalization of ATR output using CATMuS guidelines, for both Latin and Old French. It fixes spacing, it has tendencies to overnormalize and add punctuation. ```py from transformers import pipeline import unicodedata pipe = pipeline( task="text2text-generation", # change if needed model="comma-project/normalization-byt5-small", # local directory tokenizer="comma-project/normalization-byt5-small" ) pipe(unicodedata.normalize("NFD", "Scͥbo uobiᷤᷤ ñ pauli ł donati. ")) # [{'generated_text': 'scribo uobis, non Pauli uel Donati''}] ```