Instructions to use stabilityai/stable-audio-3-optimized with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Stable Audio 3
How to use stabilityai/stable-audio-3-optimized with Stable Audio 3:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
DiT engines: replace BF16 with FP16-mixed (FP32 islands) + add FP32 variants
Browse filesDrop dit_bf16.trt for all three SA3 DiTs (sm-music, sm-sfx, sa3-m). BF16
quantization error compounds over 8 pingpong sampling steps, drifting
cos-sim vs PT FP32 from 0.99 (single step) to 0.81 (final latent) and
producing audibly degraded output.
Replace with dit_fp16mixed.trt (the canonical from now on): FP16 trunk
with FP32 islands around every RMSNorm chain (Pow+ReduceMean+Sqrt+Mul),
every attention Softmax, and the RoPE region (anything reachable from a
Cast(to=FP32) feeding Cos/Sin/Einsum). 140 RMSNorms + 40 Softmaxes per
sm-music block, more for medium. Built with STRONGLY_TYPED so TRT honors
the explicit dtypes (no auto-promotion). Matches MLX's "FP16 with implicit
FP32 reductions via fused kernels" recipe.
Per-step cos-sim vs PT FP32: 0.99997 single-step, 0.998 over 8 steps.
RMS-curve correlation: 0.998. Audio basically indistinguishable from
FP32 PyTorch reference at 1/15 the wall time (43 ms vs 630 ms).
Also adds dit_fp32.trt (1.8 GB sm-*, 5.8 GB medium) for users who want
bit-for-bit parity with the PyTorch reference at the cost of ~3x slower
inference.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- tensorRT/sm_90/sa3-m/{dit_bf16.trt → dit_fp16mixed.trt} +2 -2
- tensorRT/sm_90/sa3-m/dit_fp32.trt +3 -0
- tensorRT/sm_90/sa3-sm-music/{dit_bf16.trt → dit_fp16mixed.trt} +2 -2
- tensorRT/sm_90/sa3-sm-music/dit_fp32.trt +3 -0
- tensorRT/sm_90/sa3-sm-sfx/{dit_bf16.trt → dit_fp16mixed.trt} +2 -2
- tensorRT/sm_90/sa3-sm-sfx/dit_fp32.trt +3 -0
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
-
size
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:399f3fa18e21f86528322a4543fed17999f6bd95589886f2d4f8f3e2c77fc425
|
| 3 |
+
size 2914585244
|
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:f6787419c134298f40bcd3a7e4b3fbc9427b880dc801e2a66a85a39fd326f964
|
| 3 |
+
size 5820343524
|
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
-
size
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:bb7be6e8d74392f4acfd954098c098f0c6f82d171870497b270e8a8281cb25f3
|
| 3 |
+
size 935602284
|
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:35ea9b5f039abfc0b1fd2294ab814a9e26da0567342e29a700da1ee85ab4636d
|
| 3 |
+
size 1842306180
|
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
-
size
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:1646a9d73ffe75098bd29a1e51770c38f2e70115eb86c2bbbcb7f7f2a9c89b82
|
| 3 |
+
size 935536684
|
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:e63e18d15e97689ad2afd8a2adf55b30f79568aa9114dba0bbf50931c9a2c3cf
|
| 3 |
+
size 1842314452
|