dariofinardi commited on
Commit
289453a
·
verified ·
1 Parent(s): 707c8e8

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +198 -3
README.md CHANGED
@@ -1,3 +1,198 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ - it
5
+ - fr
6
+ - es
7
+ - de
8
+ - pt
9
+ tags:
10
+ - temporal-normalization
11
+ - byt5
12
+ - onnx
13
+ - medical
14
+ ---
15
+
16
+ # Semplifica T5 Temporal Normalizer
17
+
18
+ ## Model Description
19
+
20
+ **Semplifica T5 Temporal Normalizer** is a fine-tuned version of Google's [ByT5-Small](https://huggingface.co/google/byt5-small) specifically designed to solve a complex NLP problem: **normalizing noisy, slang, relative, and incomplete temporal expressions** into standard ISO formats (`YYYY-MM-DD` or `HH:MM`).
21
+
22
+ By operating at the character level (UTF-8 bytes), ByT5 is intrinsically immune to typos, dirty OCR outputs, and Out-Of-Vocabulary (OOV) tokens, making it exceptionally reliable for real-world, messy documents.
23
+
24
+ The model expects an **Anchor Date** (reference date), an optional **Language Code**, and the **Temporal String** as input:
25
+ > Input format: `YYYY-MM-DD | lang | input_text`
26
+
27
+ ## Use Cases
28
+
29
+ 1. **Clinical & Medical (EHR) — Primary:** Extract precise timelines from Electronic Health Records where doctors use extreme abbreviations ("3 days post-op", "admission + 2").
30
+ 2. **Legal & Compliance:** Analyze legal contracts with relative deadlines ("within 30 days from signature").
31
+ 3. **Conversational AI & Booking:** Chatbots processing user requests like "book a flight for next Tuesday afternoon".
32
+ 4. **Logistics & Supply Chain:** Parsing informal shipping emails ("expected delivery in 2 days").
33
+
34
+ ## Hardware Portability & ONNX
35
+
36
+ A core goal of this model is **universal portability**. It has been exported to **ONNX** in three precision formats:
37
+
38
+ | Format | Size | Notes |
39
+ |--------|------|-------|
40
+ | FP32 | ~1.14 GB | Full precision (Encoder + Decoder separated), validation reference |
41
+ | FP16 | ~738 MB | Half precision, ideal for GPU/NPU with Tensor Cores |
42
+ | INT8 | ~290 MB | Symmetric per-tensor weight quantization (~75% reduction vs FP32), ideal for CPU / Edge / Rust |
43
+
44
+ ## Evaluation Metrics (ONNX Runtime)
45
+
46
+ Tested on GPU (CUDAExecutionProvider) using a 1,000 records evaluation sample:
47
+
48
+ | Model Format | Size | Exact Match Accuracy | F1 (Macro) | Throughput (samples/s) |
49
+ |--------------|------|----------------------|------------|------------------------|
50
+ | **FP32** | ~1.14 GB | 99.40% | 99.53% | ~44.0 |
51
+ | **FP16** | ~738 MB | 99.40% | 99.53% | ~39.8 |
52
+ | **INT8** | ~290 MB | 99.40% | 99.53% | ~31.7 |
53
+
54
+ ---
55
+
56
+ ## Usage in Python (HuggingFace Transformers)
57
+
58
+ ```python
59
+ from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
60
+
61
+ model_id = "SemplificaAI/t5-temporal-normalizer"
62
+ # Important: always load the tokenizer from the base model to avoid a known
63
+ # ByT5 tokenizer serialization bug in transformers >= 5.x
64
+ tokenizer = AutoTokenizer.from_pretrained("google/byt5-small")
65
+ model = AutoModelForSeq2SeqLM.from_pretrained(model_id)
66
+
67
+ # Format: YYYY-MM-DD | lang | text
68
+ input_text = "2024-01-01 | en | 3 days post admission"
69
+ inputs = tokenizer(input_text, return_tensors="pt")
70
+
71
+ outputs = model.generate(**inputs, max_length=16)
72
+ # Use skip_special_tokens=False + manual cleanup to avoid a deadlock bug
73
+ # in transformers >= 5.x with skip_special_tokens=True
74
+ result = tokenizer.decode(outputs[0], skip_special_tokens=False)
75
+ result = result.replace("<pad>", "").replace("</s>", "").strip()
76
+
77
+ print(result)
78
+ # Output: 2024-01-04
79
+ ```
80
+
81
+ ## Usage in Python (ONNX Runtime)
82
+
83
+ ```python
84
+ import onnxruntime as ort
85
+ import numpy as np
86
+ from transformers import AutoTokenizer
87
+
88
+ tokenizer = AutoTokenizer.from_pretrained("google/byt5-small")
89
+ opts = ort.SessionOptions()
90
+ enc_sess = ort.InferenceSession("byt5_encoder_int8.onnx", sess_opts=opts, providers=["CPUExecutionProvider"])
91
+ dec_sess = ort.InferenceSession("byt5_decoder_int8.onnx", sess_opts=opts, providers=["CPUExecutionProvider"])
92
+
93
+ input_text = "2024-01-01 | en | 3 days post admission"
94
+ enc = tokenizer(input_text, return_tensors="np", max_length=64, padding="max_length", truncation=True)
95
+
96
+ # 1. Encoder forward pass
97
+ enc_hs = enc_sess.run(None, {
98
+ "input_ids": enc["input_ids"],
99
+ "attention_mask": enc["attention_mask"],
100
+ })[0]
101
+
102
+ # 2. Autoregressive greedy decode loop
103
+ MAX_OUT_LEN = 16
104
+ PAD_ID = 0
105
+ EOS_ID = 1
106
+
107
+ cur_ids = np.zeros((1, MAX_OUT_LEN), dtype=np.int64)
108
+ cur_mask = np.zeros((1, MAX_OUT_LEN), dtype=np.int64)
109
+ cur_ids[0, 0] = PAD_ID
110
+ cur_mask[0, 0] = 1
111
+
112
+ generated = []
113
+
114
+ for step in range(MAX_OUT_LEN - 1):
115
+ logits = dec_sess.run(None, {
116
+ "decoder_input_ids": cur_ids,
117
+ "decoder_attention_mask": cur_mask,
118
+ "encoder_hidden_states": enc_hs,
119
+ "encoder_attention_mask": enc["attention_mask"],
120
+ })[0]
121
+
122
+ next_tok = int(np.argmax(logits[0, step]))
123
+ if next_tok == EOS_ID:
124
+ break
125
+ generated.append(next_tok)
126
+
127
+ cur_ids[0, step + 1] = next_tok
128
+ cur_mask[0, step + 1] = 1
129
+
130
+ output_text = bytes([t - 3 for t in generated if t >= 3]).decode("utf-8", errors="ignore")
131
+ print("Prediction:", output_text)
132
+ ```
133
+
134
+ ## Usage in Go (ONNX Runtime)
135
+
136
+ A highly optimized Go evaluation pipeline is available in the `go_eval` directory, demonstrating the separation of Encoder and Decoder execution with pre-allocated tensors and fixed sequence padding (`MAX_OUT_LEN = 16`). It supports fallback to `CUDAExecutionProvider`.
137
+
138
+ ```go
139
+ package main
140
+
141
+ import (
142
+ "fmt"
143
+ ort "github.com/yalue/onnxruntime_go"
144
+ )
145
+
146
+ func main() {
147
+ ort.SetSharedLibraryPath("libonnxruntime.so")
148
+ ort.InitializeEnvironment()
149
+ defer ort.DestroyEnvironment()
150
+
151
+ // Load separated ONNX models
152
+ encSess, _ := ort.NewAdvancedSession("byt5_encoder_fp32.onnx", /* ... */)
153
+ decSess, _ := ort.NewAdvancedSession("byt5_decoder_fp32.onnx", /* ... */)
154
+
155
+ // 1. Encoder pass
156
+ _ = encSess.Run()
157
+
158
+ // 2. Decoder autoregressive loop with fixed mask
159
+ for step := 0; step < 15; step++ {
160
+ _ = decSess.Run()
161
+ // Get step logits, argmax, and update input buffer
162
+ }
163
+ }
164
+ ```
165
+
166
+ ## Usage in Rust (ONNX Runtime)
167
+
168
+ For production environments, use the [`ort`](https://github.com/pykeio/ort) crate. Since T5 is an encoder-decoder architecture, generation requires an autoregressive loop.
169
+
170
+ ```toml
171
+ # Cargo.toml
172
+ [dependencies]
173
+ ort = "2.0"
174
+ ```
175
+
176
+ ```rust
177
+ use ort::{GraphOptimizationLevel, Session};
178
+
179
+ fn main() -> ort::Result<()> {
180
+ let session = Session::builder()?
181
+ .with_optimization_level(GraphOptimizationLevel::Level3)?
182
+ .with_intra_threads(4)?
183
+ .commit_from_file("byt5_encoder_fp32.onnx")?;
184
+
185
+ // ByT5 tokenization: each UTF-8 byte maps to token_id = byte + 3
186
+ // (0=pad, 1=eos, 2=unk, then 3..258 = bytes 0..255)
187
+ // Load both encoder and decoder sessions, then run autoregressive loop with fixed size padding
188
+
189
+ Ok(())
190
+ }
191
+ ```
192
+
193
+ ## Technical Notes
194
+
195
+ - **ByT5 Tokenizer:** Each UTF-8 byte maps to `token_id = byte_value + 3`. Tokens 0/1/2 are PAD/EOS/UNK. Always load the tokenizer from `google/byt5-small` — the fine-tuned checkpoint may have a corrupted tokenizer config due to a known serialization bug in `transformers >= 5.x`.
196
+ - **ONNX Export:** Exported with `torch.onnx.export(dynamo=True)` + `onnxscript`. The old JIT tracer (`dynamo=False`) is incompatible with the new masking utilities in `transformers >= 5.x`.
197
+ - **INT8 Quantization:** Symmetric per-tensor quantization applied directly to the ONNX graph initializers (numpy-based). PyTorch `quantize_dynamic` models are not exportable via the dynamo exporter (`LinearPackedParamsBase` is not serializable by `torch.export`).
198
+ - **ONNX Architecture:** To overcome issues with ByT5 relative positional embeddings dynamically broadcasting at runtime, the model is exported as a **separated Encoder and Decoder**. The Decoder expects a fixed-length sequence of 16, which is updated sequentially using a padding mask during the autoregressive loop (see Python and Rust examples above).