| --- |
| license: cc-by-4.0 |
| datasets: |
| - comma-project/alignement-pairs |
| language: |
| - fr |
| - la |
| base_model: |
| - google/byt5-small |
| pipeline_tag: translation |
| examples: |
| - text: "Scͥbo uobiᷤᷤ ñ pauli ł donati." |
| - text: "Car toutes les lois sont fõ dees cor recon droitu riele pour quoi se les lois ne tont droiture " |
| --- |
| |
| # ByT5-Small for Normalization |
|
|
| This models allows for normalization of ATR output using CATMuS guidelines, for both Latin and Old French. It fixes spacing, it has tendencies to |
| overnormalize and add punctuation. |
|
|
| ```py |
| from transformers import pipeline |
| import unicodedata |
| |
| pipe = pipeline( |
| task="text2text-generation", # change if needed |
| model="comma-project/normalization-byt5-small", # local directory |
| tokenizer="comma-project/normalization-byt5-small" |
| ) |
| pipe(unicodedata.normalize("NFD", "Scͥbo uobiᷤᷤ ñ pauli ł donati. ")) |
| # [{'generated_text': 'scribo uobis, non Pauli uel Donati''}] |
| ``` |
|
|