Translation
Safetensors
French
Latin
t5
File size: 948 Bytes
96e7367
 
 
 
 
 
 
 
 
 
 
5acb971
96e7367
 
 
 
 
 
 
 
 
 
5acb971
96e7367
 
 
7ffdcfb
 
96e7367
5acb971
 
96e7367
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
---
license: cc-by-4.0
datasets:
- comma-project/alignement-pairs
language:
- fr
- la
base_model:
- google/byt5-small
pipeline_tag: translation
examples:
- text: "Scͥbo uobiᷤᷤ ñ pauli ł donati."
- text: "Car toutes les lois sont fõ dees cor recon droitu riele pour quoi se les lois ne tont droiture "
---

# ByT5-Small for Normalization

This models allows for normalization of ATR output using CATMuS guidelines, for both Latin and Old French. It fixes spacing, it has tendencies to
overnormalize and add punctuation.

```py
from transformers import pipeline
import unicodedata

pipe = pipeline(
    task="text2text-generation",  # change if needed
    model="comma-project/normalization-byt5-small",                  # local directory
    tokenizer="comma-project/normalization-byt5-small"
)
pipe(unicodedata.normalize("NFD", "Scͥbo uobiᷤᷤ ñ pauli ł donati. "))
# [{'generated_text': 'scribo uobis, non Pauli uel Donati''}]
```