File size: 948 Bytes
96e7367 5acb971 96e7367 5acb971 96e7367 7ffdcfb 96e7367 5acb971 96e7367 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | ---
license: cc-by-4.0
datasets:
- comma-project/alignement-pairs
language:
- fr
- la
base_model:
- google/byt5-small
pipeline_tag: translation
examples:
- text: "Scͥbo uobiᷤᷤ ñ pauli ł donati."
- text: "Car toutes les lois sont fõ dees cor recon droitu riele pour quoi se les lois ne tont droiture "
---
# ByT5-Small for Normalization
This models allows for normalization of ATR output using CATMuS guidelines, for both Latin and Old French. It fixes spacing, it has tendencies to
overnormalize and add punctuation.
```py
from transformers import pipeline
import unicodedata
pipe = pipeline(
task="text2text-generation", # change if needed
model="comma-project/normalization-byt5-small", # local directory
tokenizer="comma-project/normalization-byt5-small"
)
pipe(unicodedata.normalize("NFD", "Scͥbo uobiᷤᷤ ñ pauli ł donati. "))
# [{'generated_text': 'scribo uobis, non Pauli uel Donati''}]
```
|