Translation
Safetensors
French
Latin
t5
ponteineptique's picture
Update README.md
5acb971 verified
metadata
license: cc-by-4.0
datasets:
  - comma-project/alignement-pairs
language:
  - fr
  - la
base_model:
  - google/byt5-small
pipeline_tag: translation
examples:
  - text: Scͥbo uobiᷤᷤ  pauli ł donati.
  - text: >-
      Car toutes les lois sont fõ dees cor recon droitu riele pour quoi se les
      lois ne tont droiture 

ByT5-Small for Normalization

This models allows for normalization of ATR output using CATMuS guidelines, for both Latin and Old French. It fixes spacing, it has tendencies to overnormalize and add punctuation.

from transformers import pipeline
import unicodedata

pipe = pipeline(
    task="text2text-generation",  # change if needed
    model="comma-project/normalization-byt5-small",                  # local directory
    tokenizer="comma-project/normalization-byt5-small"
)
pipe(unicodedata.normalize("NFD", "Scͥbo uobiᷤᷤ ñ pauli ł donati. "))
# [{'generated_text': 'scribo uobis, non Pauli uel Donati''}]