File size: 948 Bytes

---
license: cc-by-4.0
datasets:
- comma-project/alignement-pairs
language:
- fr
- la
base_model:
- google/byt5-small
pipeline_tag: translation
examples:
- text: "Scͥbo uobiᷤᷤ ñ pauli ł donati."
- text: "Car toutes les lois sont fõ dees cor recon droitu riele pour quoi se les lois ne tont droiture "
---

# ByT5-Small for Normalization

This models allows for normalization of ATR output using CATMuS guidelines, for both Latin and Old French. It fixes spacing, it has tendencies to
overnormalize and add punctuation.

```py
from transformers import pipeline
import unicodedata

pipe = pipeline(
    task="text2text-generation",  # change if needed
    model="comma-project/normalization-byt5-small",                  # local directory
    tokenizer="comma-project/normalization-byt5-small"
)
pipe(unicodedata.normalize("NFD", "Scͥbo uobiᷤᷤ ñ pauli ł donati. "))
# [{'generated_text': 'scribo uobis, non Pauli uel Donati''}]
```