Update README.md

5acb971 verified 3 months ago

948 Bytes

license: cc-by-4.0
datasets:
  - comma-project/alignement-pairs
language:
  - fr
  - la
base_model:
  - google/byt5-small
pipeline_tag: translation
examples:
  - text: Scͥbo uobiᷤᷤ ñ pauli ł donati.
  - text: >-
      Car toutes les lois sont fõ dees cor recon droitu riele pour quoi se les
      lois ne tont droiture

ByT5-Small for Normalization

This models allows for normalization of ATR output using CATMuS guidelines, for both Latin and Old French. It fixes spacing, it has tendencies to overnormalize and add punctuation.

from transformers import pipeline
import unicodedata

pipe = pipeline(
    task="text2text-generation",  # change if needed
    model="comma-project/normalization-byt5-small",                  # local directory
    tokenizer="comma-project/normalization-byt5-small"
)
pipe(unicodedata.normalize("NFD", "Scͥbo uobiᷤᷤ ñ pauli ł donati. "))
# [{'generated_text': 'scribo uobis, non Pauli uel Donati''}]