comma-project
/

normalization-byt5-small

Model card Files Files and versions

ponteineptique commited on Dec 22, 2025

Commit

96e7367

·

verified ·

1 Parent(s): cc65a2d

Create README.md

Files changed (1) hide show

README.md +31 -0

README.md ADDED Viewed

	@@ -0,0 +1,31 @@

+---
+license: cc-by-4.0
+datasets:
+- comma-project/alignement-pairs
+language:
+- fr
+- la
+base_model:
+- google/byt5-small
+pipeline_tag: translation
+examples:
+- text: "⁊ non facimus ĩitatem."
+- text: "Car toutes les lois sont fõ dees cor recon droitu riele pour quoi se les lois ne tont droiture "
+---
+# ByT5-Small for Normalization
+This models allows for normalization of ATR output using CATMuS guidelines, for both Latin and Old French. It fixes spacing, it has tendencies to
+overnormalize and add punctuation.
+```py
+from transformers import pipeline
+pipe = pipeline(
+    task="text2text-generation",  # change if needed
+    model=".",                  # local directory
+    tokenizer="."
+)
+pipe("⁊ non facimus ĩitatem.")
+# [{'generated_text': ' non facimus veritatem.'}]
+```