comma-project
/

normalization-byt5-small

Model card Files Files and versions

normalization-byt5-small / README.md

ponteineptique's picture

Update README.md

5acb971 verified 3 months ago

|

history blame contribute delete

948 Bytes

	---
	license: cc-by-4.0
	datasets:
	- comma-project/alignement-pairs
	language:
	- fr
	- la
	base_model:
	- google/byt5-small
	pipeline_tag: translation
	examples:
	- text: "Scͥbo uobiᷤᷤ ñ pauli ł donati."
	- text: "Car toutes les lois sont fõ dees cor recon droitu riele pour quoi se les lois ne tont droiture "
	---

	# ByT5-Small for Normalization

	This models allows for normalization of ATR output using CATMuS guidelines, for both Latin and Old French. It fixes spacing, it has tendencies to
	overnormalize and add punctuation.

	```py
	from transformers import pipeline
	import unicodedata

	pipe = pipeline(
	task="text2text-generation", # change if needed
	model="comma-project/normalization-byt5-small", # local directory
	tokenizer="comma-project/normalization-byt5-small"
	)
	pipe(unicodedata.normalize("NFD", "Scͥbo uobiᷤᷤ ñ pauli ł donati. "))
	# [{'generated_text': 'scribo uobis, non Pauli uel Donati''}]
	```