File size: 7,723 Bytes
289453a d01c16d 289453a d01c16d 289453a cee33c8 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 | ---
language:
- en
- it
- fr
- es
- de
- pt
tags:
- temporal-normalization
- byt5
- onnx
- medical
---
# Semplifica T5 Temporal Normalizer
## Model Description
**Semplifica T5 Temporal Normalizer** is a fine-tuned version of Google's [ByT5-Small](https://huggingface.co/google/byt5-small) specifically designed to solve a complex NLP problem: **normalizing noisy, slang, relative, and incomplete temporal expressions** into standard ISO formats (`YYYY-MM-DD` or `HH:MM`).
By operating at the character level (UTF-8 bytes), ByT5 is intrinsically immune to typos, dirty OCR outputs, and Out-Of-Vocabulary (OOV) tokens, making it exceptionally reliable for real-world, messy documents.
The model expects an **Anchor Date** (reference date), an optional **Language Code**, and the **Temporal String** as input:
> Input format: `YYYY-MM-DD | lang (optional) | input_text`
## Use Cases
1. **Clinical & Medical (EHR) — Primary:** Extract precise timelines from Electronic Health Records where doctors use extreme abbreviations ("3 days post-op", "admission + 2").
2. **Legal & Compliance:** Analyze legal contracts with relative deadlines ("within 30 days from signature").
3. **Conversational AI & Booking:** Chatbots processing user requests like "book a flight for next Tuesday afternoon".
4. **Logistics & Supply Chain:** Parsing informal shipping emails ("expected delivery in 2 days").
## Hardware Portability & ONNX
A core goal of this model is **universal portability**. It has been exported to **ONNX** in three precision formats:
| Format | Size | Notes |
|--------|------|-------|
| FP32 | ~1.14 GB | Full precision (Encoder + Decoder separated), validation reference |
| FP16 | ~738 MB | Half precision, ideal for GPU/NPU with Tensor Cores |
| INT8 | ~290 MB | Symmetric per-tensor weight quantization (~75% reduction vs FP32), ideal for CPU / Edge / Rust |
## Evaluation Metrics (ONNX Runtime)
Tested on GPU (CUDAExecutionProvider) using a 1,000 records evaluation sample:
| Model Format | Size | Exact Match Accuracy | F1 (Macro) | Throughput (samples/s) |
|--------------|------|----------------------|------------|------------------------|
| **FP32** | ~1.14 GB | 99.40% | 99.53% | ~44.0 |
| **FP16** | ~738 MB | 99.40% | 99.53% | ~39.8 |
| **INT8** | ~290 MB | 99.40% | 99.53% | ~31.7 |
---
## Usage in Python (HuggingFace Transformers)
```python
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
model_id = "SemplificaAI/t5-temporal-normalizer"
# Important: always load the tokenizer from the base model to avoid a known
# ByT5 tokenizer serialization bug in transformers >= 5.x
tokenizer = AutoTokenizer.from_pretrained("google/byt5-small")
model = AutoModelForSeq2SeqLM.from_pretrained(model_id)
# Format: YYYY-MM-DD | lang (optional) | text
input_text = "2024-01-01 | en | 3 days post admission"
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_length=16)
# Use skip_special_tokens=False + manual cleanup to avoid a deadlock bug
# in transformers >= 5.x with skip_special_tokens=True
result = tokenizer.decode(outputs[0], skip_special_tokens=False)
result = result.replace("<pad>", "").replace("</s>", "").strip()
print(result)
# Output: 2024-01-04
```
## Usage in Python (ONNX Runtime)
```python
import onnxruntime as ort
import numpy as np
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("google/byt5-small")
opts = ort.SessionOptions()
enc_sess = ort.InferenceSession("byt5_encoder_int8.onnx", sess_opts=opts, providers=["CPUExecutionProvider"])
dec_sess = ort.InferenceSession("byt5_decoder_int8.onnx", sess_opts=opts, providers=["CPUExecutionProvider"])
input_text = "2024-01-01 | en | 3 days post admission"
enc = tokenizer(input_text, return_tensors="np", max_length=64, padding="max_length", truncation=True)
# 1. Encoder forward pass
enc_hs = enc_sess.run(None, {
"input_ids": enc["input_ids"],
"attention_mask": enc["attention_mask"],
})[0]
# 2. Autoregressive greedy decode loop
MAX_OUT_LEN = 16
PAD_ID = 0
EOS_ID = 1
cur_ids = np.zeros((1, MAX_OUT_LEN), dtype=np.int64)
cur_mask = np.zeros((1, MAX_OUT_LEN), dtype=np.int64)
cur_ids[0, 0] = PAD_ID
cur_mask[0, 0] = 1
generated = []
for step in range(MAX_OUT_LEN - 1):
logits = dec_sess.run(None, {
"decoder_input_ids": cur_ids,
"decoder_attention_mask": cur_mask,
"encoder_hidden_states": enc_hs,
"encoder_attention_mask": enc["attention_mask"],
})[0]
next_tok = int(np.argmax(logits[0, step]))
if next_tok == EOS_ID:
break
generated.append(next_tok)
cur_ids[0, step + 1] = next_tok
cur_mask[0, step + 1] = 1
output_text = bytes([t - 3 for t in generated if t >= 3]).decode("utf-8", errors="ignore")
print("Prediction:", output_text)
```
## Usage in Go (ONNX Runtime)
A highly optimized Go evaluation pipeline is available in the `go_eval` directory, demonstrating the separation of Encoder and Decoder execution with pre-allocated tensors and fixed sequence padding (`MAX_OUT_LEN = 16`). It supports fallback to `CUDAExecutionProvider`.
```go
package main
import (
"fmt"
ort "github.com/yalue/onnxruntime_go"
)
func main() {
ort.SetSharedLibraryPath("libonnxruntime.so")
ort.InitializeEnvironment()
defer ort.DestroyEnvironment()
// Load separated ONNX models
encSess, _ := ort.NewAdvancedSession("byt5_encoder_fp32.onnx", /* ... */)
decSess, _ := ort.NewAdvancedSession("byt5_decoder_fp32.onnx", /* ... */)
// 1. Encoder pass
_ = encSess.Run()
// 2. Decoder autoregressive loop with fixed mask
for step := 0; step < 15; step++ {
_ = decSess.Run()
// Get step logits, argmax, and update input buffer
}
}
```
## Usage in Rust (ONNX Runtime)
For production environments, use the [`ort`](https://github.com/pykeio/ort) crate. Since T5 is an encoder-decoder architecture, generation requires an autoregressive loop.
```toml
# Cargo.toml
[dependencies]
ort = "2.0"
```
```rust
use ort::{GraphOptimizationLevel, Session};
fn main() -> ort::Result<()> {
let session = Session::builder()?
.with_optimization_level(GraphOptimizationLevel::Level3)?
.with_intra_threads(4)?
.commit_from_file("byt5_encoder_fp32.onnx")?;
// ByT5 tokenization: each UTF-8 byte maps to token_id = byte + 3
// (0=pad, 1=eos, 2=unk, then 3..258 = bytes 0..255)
// Load both encoder and decoder sessions, then run autoregressive loop with fixed size padding
Ok(())
}
```
## Technical Notes
- **ByT5 Tokenizer:** Each UTF-8 byte maps to `token_id = byte_value + 3`. Tokens 0/1/2 are PAD/EOS/UNK. Always load the tokenizer from `google/byt5-small` — the fine-tuned checkpoint may have a corrupted tokenizer config due to a known serialization bug in `transformers >= 5.x`.
- **ONNX Export:** Exported with `torch.onnx.export(dynamo=True)` + `onnxscript`. The old JIT tracer (`dynamo=False`) is incompatible with the new masking utilities in `transformers >= 5.x`.
- **INT8 Quantization:** Symmetric per-tensor quantization applied directly to the ONNX graph initializers (numpy-based). PyTorch `quantize_dynamic` models are not exportable via the dynamo exporter (`LinearPackedParamsBase` is not serializable by `torch.export`).
- **ONNX Architecture:** To overcome issues with ByT5 relative positional embeddings dynamically broadcasting at runtime, the model is exported as a **separated Encoder and Decoder**. The Decoder expects a fixed-length sequence of 16, which is updated sequentially using a padding mask during the autoregressive loop (see Python and Rust examples above).
## Author & Contact
- **Author:** Dario Finardi
- **Company:** [Semplifica](https://semplifica.ai)
- **Email:** hf@semplifica.ai
|