Decoder TRT engines: PCM-baked-in (sm_90)

Companion to the previous commit's ONNX update — the published sm_90 .trt
engines for SAME-S and SAME-L decoders now have the same clamp+scale+
cast(int32)+permute postprocess baked into the engine graph.

Drop-in compatible: sa3_trt.py auto-detects engine flavor by output tensor
name (pcm vs audio) and skips Stage 5 clip/cast/transpose when pcm-baked.
Stage 5 wall time drops from ~110ms to ~21ms on sm-music + same-s, 30s
audio.

Files changed (2) hide show

tensorRT/sm_90/same-l/dec_dynamic_triton_swa.trt +2 -2
tensorRT/sm_90/same-s/dec_dynamic_bf16.trt +2 -2

tensorRT/sm_90/same-l/dec_dynamic_triton_swa.trt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:22a97b3a357be2c1a8cf48591b31e8820597ef2f577b79d344eb58e6ea021521
-size 1198467036

 version https://git-lfs.github.com/spec/v1
+oid sha256:e91745733080485e5c725d1a0a369c40b7a932587f0a649583330b49165c6c99
+size 1198545588

tensorRT/sm_90/same-s/dec_dynamic_bf16.trt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:a1454c16b0015f72e2a8c5671efdcf592a63000d111ac454bfdea42eedb52dd0
-size 115100652

 version https://git-lfs.github.com/spec/v1
+oid sha256:1a389e4611448da804a9d88d4383bd199752f722a333fc9b07ea307f319e82bd
+size 115053444