Cortexelus commited on
Commit
106e6fd
·
1 Parent(s): 459fd8d

Decoder TRT engines: PCM-baked-in (sm_90)

Browse files

Companion to the previous commit's ONNX update — the published sm_90 .trt
engines for SAME-S and SAME-L decoders now have the same clamp+scale+
cast(int32)+permute postprocess baked into the engine graph.

Drop-in compatible: sa3_trt.py auto-detects engine flavor by output tensor
name (pcm vs audio) and skips Stage 5 clip/cast/transpose when pcm-baked.
Stage 5 wall time drops from ~110ms to ~21ms on sm-music + same-s, 30s
audio.

tensorRT/sm_90/same-l/dec_dynamic_triton_swa.trt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:22a97b3a357be2c1a8cf48591b31e8820597ef2f577b79d344eb58e6ea021521
3
- size 1198467036
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e91745733080485e5c725d1a0a369c40b7a932587f0a649583330b49165c6c99
3
+ size 1198545588
tensorRT/sm_90/same-s/dec_dynamic_bf16.trt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:a1454c16b0015f72e2a8c5671efdcf592a63000d111ac454bfdea42eedb52dd0
3
- size 115100652
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1a389e4611448da804a9d88d4383bd199752f722a333fc9b07ea307f319e82bd
3
+ size 115053444