Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -22,7 +22,7 @@ tags:
|
|
| 22 |
By operating at the character level (UTF-8 bytes), ByT5 is intrinsically immune to typos, dirty OCR outputs, and Out-Of-Vocabulary (OOV) tokens, making it exceptionally reliable for real-world, messy documents.
|
| 23 |
|
| 24 |
The model expects an **Anchor Date** (reference date), an optional **Language Code**, and the **Temporal String** as input:
|
| 25 |
-
> Input format: `YYYY-MM-DD | lang | input_text`
|
| 26 |
|
| 27 |
## Use Cases
|
| 28 |
|
|
@@ -64,7 +64,7 @@ model_id = "SemplificaAI/t5-temporal-normalizer"
|
|
| 64 |
tokenizer = AutoTokenizer.from_pretrained("google/byt5-small")
|
| 65 |
model = AutoModelForSeq2SeqLM.from_pretrained(model_id)
|
| 66 |
|
| 67 |
-
# Format: YYYY-MM-DD | lang | text
|
| 68 |
input_text = "2024-01-01 | en | 3 days post admission"
|
| 69 |
inputs = tokenizer(input_text, return_tensors="pt")
|
| 70 |
|
|
|
|
| 22 |
By operating at the character level (UTF-8 bytes), ByT5 is intrinsically immune to typos, dirty OCR outputs, and Out-Of-Vocabulary (OOV) tokens, making it exceptionally reliable for real-world, messy documents.
|
| 23 |
|
| 24 |
The model expects an **Anchor Date** (reference date), an optional **Language Code**, and the **Temporal String** as input:
|
| 25 |
+
> Input format: `YYYY-MM-DD | lang (optional) | input_text`
|
| 26 |
|
| 27 |
## Use Cases
|
| 28 |
|
|
|
|
| 64 |
tokenizer = AutoTokenizer.from_pretrained("google/byt5-small")
|
| 65 |
model = AutoModelForSeq2SeqLM.from_pretrained(model_id)
|
| 66 |
|
| 67 |
+
# Format: YYYY-MM-DD | lang (optional) | text
|
| 68 |
input_text = "2024-01-01 | en | 3 days post admission"
|
| 69 |
inputs = tokenizer(input_text, return_tensors="pt")
|
| 70 |
|