SemplificaAI
/

t5-temporal-normalizer

@@ -22,7 +22,7 @@ tags:
 By operating at the character level (UTF-8 bytes), ByT5 is intrinsically immune to typos, dirty OCR outputs, and Out-Of-Vocabulary (OOV) tokens, making it exceptionally reliable for real-world, messy documents.
 The model expects an **Anchor Date** (reference date), an optional **Language Code**, and the **Temporal String** as input:
-> Input format: `YYYY-MM-DD | lang | input_text`
 ## Use Cases
@@ -64,7 +64,7 @@ model_id = "SemplificaAI/t5-temporal-normalizer"
 tokenizer = AutoTokenizer.from_pretrained("google/byt5-small")
 model = AutoModelForSeq2SeqLM.from_pretrained(model_id)
-# Format: YYYY-MM-DD | lang | text
 input_text = "2024-01-01 | en | 3 days post admission"
 inputs = tokenizer(input_text, return_tensors="pt")

 By operating at the character level (UTF-8 bytes), ByT5 is intrinsically immune to typos, dirty OCR outputs, and Out-Of-Vocabulary (OOV) tokens, making it exceptionally reliable for real-world, messy documents.
 The model expects an **Anchor Date** (reference date), an optional **Language Code**, and the **Temporal String** as input:
+> Input format: `YYYY-MM-DD | lang (optional) | input_text`
 ## Use Cases
 tokenizer = AutoTokenizer.from_pretrained("google/byt5-small")
 model = AutoModelForSeq2SeqLM.from_pretrained(model_id)
+# Format: YYYY-MM-DD | lang (optional) | text
 input_text = "2024-01-01 | en | 3 days post admission"
 inputs = tokenizer(input_text, return_tensors="pt")