| --- |
| language: |
| - en |
| - it |
| - fr |
| - es |
| - de |
| - pt |
| tags: |
| - temporal-normalization |
| - byt5 |
| - onnx |
| - medical |
| --- |
| |
| # Semplifica T5 Temporal Normalizer |
|
|
| ## Model Description |
|
|
| **Semplifica T5 Temporal Normalizer** is a fine-tuned version of Google's [ByT5-Small](https://huggingface.co/google/byt5-small) specifically designed to solve a complex NLP problem: **normalizing noisy, slang, relative, and incomplete temporal expressions** into standard ISO formats (`YYYY-MM-DD` or `HH:MM`). |
|
|
| By operating at the character level (UTF-8 bytes), ByT5 is intrinsically immune to typos, dirty OCR outputs, and Out-Of-Vocabulary (OOV) tokens, making it exceptionally reliable for real-world, messy documents. |
|
|
| The model expects an **Anchor Date** (reference date), an optional **Language Code**, and the **Temporal String** as input: |
| > Input format: `YYYY-MM-DD | lang (optional) | input_text` |
| |
| ## Use Cases |
| |
| 1. **Clinical & Medical (EHR) — Primary:** Extract precise timelines from Electronic Health Records where doctors use extreme abbreviations ("3 days post-op", "admission + 2"). |
| 2. **Legal & Compliance:** Analyze legal contracts with relative deadlines ("within 30 days from signature"). |
| 3. **Conversational AI & Booking:** Chatbots processing user requests like "book a flight for next Tuesday afternoon". |
| 4. **Logistics & Supply Chain:** Parsing informal shipping emails ("expected delivery in 2 days"). |
| |
| ## Hardware Portability & ONNX |
| |
| A core goal of this model is **universal portability**. It has been exported to **ONNX** in three precision formats: |
| |
| | Format | Size | Notes | |
| |--------|------|-------| |
| | FP32 | ~1.14 GB | Full precision (Encoder + Decoder separated), validation reference | |
| | FP16 | ~738 MB | Half precision, ideal for GPU/NPU with Tensor Cores | |
| | INT8 | ~290 MB | Symmetric per-tensor weight quantization (~75% reduction vs FP32), ideal for CPU / Edge / Rust | |
| |
| ## Evaluation Metrics (ONNX Runtime) |
| |
| Tested on GPU (CUDAExecutionProvider) using a 1,000 records evaluation sample: |
| |
| | Model Format | Size | Exact Match Accuracy | F1 (Macro) | Throughput (samples/s) | |
| |--------------|------|----------------------|------------|------------------------| |
| | **FP32** | ~1.14 GB | 99.40% | 99.53% | ~44.0 | |
| | **FP16** | ~738 MB | 99.40% | 99.53% | ~39.8 | |
| | **INT8** | ~290 MB | 99.40% | 99.53% | ~31.7 | |
| |
| --- |
| |
| ## Usage in Python (HuggingFace Transformers) |
| |
| ```python |
| from transformers import AutoTokenizer, AutoModelForSeq2SeqLM |
| |
| model_id = "SemplificaAI/t5-temporal-normalizer" |
| # Important: always load the tokenizer from the base model to avoid a known |
| # ByT5 tokenizer serialization bug in transformers >= 5.x |
| tokenizer = AutoTokenizer.from_pretrained("google/byt5-small") |
| model = AutoModelForSeq2SeqLM.from_pretrained(model_id) |
| |
| # Format: YYYY-MM-DD | lang (optional) | text |
| input_text = "2024-01-01 | en | 3 days post admission" |
| inputs = tokenizer(input_text, return_tensors="pt") |
|
|
| outputs = model.generate(**inputs, max_length=16) |
| # Use skip_special_tokens=False + manual cleanup to avoid a deadlock bug |
| # in transformers >= 5.x with skip_special_tokens=True |
| result = tokenizer.decode(outputs[0], skip_special_tokens=False) |
| result = result.replace("<pad>", "").replace("</s>", "").strip() |
| |
| print(result) |
| # Output: 2024-01-04 |
| ``` |
| |
| ## Usage in Python (ONNX Runtime) |
| |
| ```python |
| import onnxruntime as ort |
| import numpy as np |
| from transformers import AutoTokenizer |
| |
| tokenizer = AutoTokenizer.from_pretrained("google/byt5-small") |
| opts = ort.SessionOptions() |
| enc_sess = ort.InferenceSession("byt5_encoder_int8.onnx", sess_opts=opts, providers=["CPUExecutionProvider"]) |
| dec_sess = ort.InferenceSession("byt5_decoder_int8.onnx", sess_opts=opts, providers=["CPUExecutionProvider"]) |
| |
| input_text = "2024-01-01 | en | 3 days post admission" |
| enc = tokenizer(input_text, return_tensors="np", max_length=64, padding="max_length", truncation=True) |
| |
| # 1. Encoder forward pass |
| enc_hs = enc_sess.run(None, { |
| "input_ids": enc["input_ids"], |
| "attention_mask": enc["attention_mask"], |
| })[0] |
| |
| # 2. Autoregressive greedy decode loop |
| MAX_OUT_LEN = 16 |
| PAD_ID = 0 |
| EOS_ID = 1 |
| |
| cur_ids = np.zeros((1, MAX_OUT_LEN), dtype=np.int64) |
| cur_mask = np.zeros((1, MAX_OUT_LEN), dtype=np.int64) |
| cur_ids[0, 0] = PAD_ID |
| cur_mask[0, 0] = 1 |
| |
| generated = [] |
| |
| for step in range(MAX_OUT_LEN - 1): |
| logits = dec_sess.run(None, { |
| "decoder_input_ids": cur_ids, |
| "decoder_attention_mask": cur_mask, |
| "encoder_hidden_states": enc_hs, |
| "encoder_attention_mask": enc["attention_mask"], |
| })[0] |
| |
| next_tok = int(np.argmax(logits[0, step])) |
| if next_tok == EOS_ID: |
| break |
| generated.append(next_tok) |
| |
| cur_ids[0, step + 1] = next_tok |
| cur_mask[0, step + 1] = 1 |
| |
| output_text = bytes([t - 3 for t in generated if t >= 3]).decode("utf-8", errors="ignore") |
| print("Prediction:", output_text) |
| ``` |
| |
| ## Usage in Go (ONNX Runtime) |
| |
| A highly optimized Go evaluation pipeline is available in the `go_eval` directory, demonstrating the separation of Encoder and Decoder execution with pre-allocated tensors and fixed sequence padding (`MAX_OUT_LEN = 16`). It supports fallback to `CUDAExecutionProvider`. |
| |
| ```go |
| package main |
| |
| import ( |
| "fmt" |
| ort "github.com/yalue/onnxruntime_go" |
| ) |
| |
| func main() { |
| ort.SetSharedLibraryPath("libonnxruntime.so") |
| ort.InitializeEnvironment() |
| defer ort.DestroyEnvironment() |
| |
| // Load separated ONNX models |
| encSess, _ := ort.NewAdvancedSession("byt5_encoder_fp32.onnx", /* ... */) |
| decSess, _ := ort.NewAdvancedSession("byt5_decoder_fp32.onnx", /* ... */) |
| |
| // 1. Encoder pass |
| _ = encSess.Run() |
| |
| // 2. Decoder autoregressive loop with fixed mask |
| for step := 0; step < 15; step++ { |
| _ = decSess.Run() |
| // Get step logits, argmax, and update input buffer |
| } |
| } |
| ``` |
| |
| ## Usage in Rust (ONNX Runtime) |
| |
| For production environments, use the [`ort`](https://github.com/pykeio/ort) crate. Since T5 is an encoder-decoder architecture, generation requires an autoregressive loop. |
| |
| ```toml |
| # Cargo.toml |
| [dependencies] |
| ort = "2.0" |
| ``` |
| |
| ```rust |
| use ort::{GraphOptimizationLevel, Session}; |
| |
| fn main() -> ort::Result<()> { |
| let session = Session::builder()? |
| .with_optimization_level(GraphOptimizationLevel::Level3)? |
| .with_intra_threads(4)? |
| .commit_from_file("byt5_encoder_fp32.onnx")?; |
| |
| // ByT5 tokenization: each UTF-8 byte maps to token_id = byte + 3 |
| // (0=pad, 1=eos, 2=unk, then 3..258 = bytes 0..255) |
| // Load both encoder and decoder sessions, then run autoregressive loop with fixed size padding |
| |
| Ok(()) |
| } |
| ``` |
| |
| ## Technical Notes |
| |
| - **ByT5 Tokenizer:** Each UTF-8 byte maps to `token_id = byte_value + 3`. Tokens 0/1/2 are PAD/EOS/UNK. Always load the tokenizer from `google/byt5-small` — the fine-tuned checkpoint may have a corrupted tokenizer config due to a known serialization bug in `transformers >= 5.x`. |
| - **ONNX Export:** Exported with `torch.onnx.export(dynamo=True)` + `onnxscript`. The old JIT tracer (`dynamo=False`) is incompatible with the new masking utilities in `transformers >= 5.x`. |
| - **INT8 Quantization:** Symmetric per-tensor quantization applied directly to the ONNX graph initializers (numpy-based). PyTorch `quantize_dynamic` models are not exportable via the dynamo exporter (`LinearPackedParamsBase` is not serializable by `torch.export`). |
| - **ONNX Architecture:** To overcome issues with ByT5 relative positional embeddings dynamically broadcasting at runtime, the model is exported as a **separated Encoder and Decoder**. The Decoder expects a fixed-length sequence of 16, which is updated sequentially using a padding mask during the autoregressive loop (see Python and Rust examples above). |
|
|
| ## Author & Contact |
|
|
| - **Author:** Dario Finardi |
| - **Company:** [Semplifica](https://semplifica.ai) |
| - **Email:** hf@semplifica.ai |
|
|