T5Gemma: FP16-mixed (FP32 attention island) — fixes BF16 numerical bug

The previous BF16 build had a numerical bug: cos similarity vs PyTorch FP32
reference dropped to 0.17 at specific tokens (the ' beautiful' token id 4964
output activations of magnitude 26 instead of 52). Some tokens silently
produced bad conditioning for downstream DiT.

Switch to FP16 trunk + FP32 attention island (STRONGLY_TYPED network), the
same strategy that fixed SAME-L decoder's accuracy. Cos vs FP32 PyTorch is
now 0.999998; the worst-token cos across our test prompts is 0.9987.

Engine filename: t5gemma_bf16.trt → t5gemma_fp16mixed.trt.
Engine size: 564 MB → 623 MB (+10%).
Latency: 0.78 ms → 0.91 ms (negligible vs DiT's 50+ ms).
ONNX: re-exported with mixed dtypes; STRONGLY_TYPED build respects them.

Consumer code (sa3_trt.py, install.sh, build.py, build_from_onnx.py) updated
in the github repo to point at the new filename.

Files changed (2) hide show

onnx/t5gemma/encoder.onnx +2 -2
tensorRT/sm_90/t5gemma/{t5gemma_bf16.trt → t5gemma_fp16mixed.trt} +2 -2

onnx/t5gemma/encoder.onnx CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:45bb0d030adfb4a16d1d900ce626bafb2d2af6e4d49b7bb584358eabb448be1f
-size 1126948438

 version https://git-lfs.github.com/spec/v1
+oid sha256:dc79684b14c9d5647bf3e7870f388837ac4b8d6708ea3ead82a37a0b39803084
+size 620393530

tensorRT/sm_90/t5gemma/{t5gemma_bf16.trt → t5gemma_fp16mixed.trt} RENAMED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:a57d9fb287a4e8aa5bb9b9b62ed940405cd8c21337854b9d44c720535664f6cf
-size 564220452

 version https://git-lfs.github.com/spec/v1
+oid sha256:1d33c35470e8cc44a919bd7b4a7959dc9adb9796457d8e7571b55a90b18046e3
+size 622964388