Instructions to use stabilityai/stable-audio-3-optimized with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Stable Audio 3
How to use stabilityai/stable-audio-3-optimized with Stable Audio 3:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
T5Gemma: FP16-mixed (FP32 attention island) — fixes BF16 numerical bug
Browse filesThe previous BF16 build had a numerical bug: cos similarity vs PyTorch FP32
reference dropped to 0.17 at specific tokens (the ' beautiful' token id 4964
output activations of magnitude 26 instead of 52). Some tokens silently
produced bad conditioning for downstream DiT.
Switch to FP16 trunk + FP32 attention island (STRONGLY_TYPED network), the
same strategy that fixed SAME-L decoder's accuracy. Cos vs FP32 PyTorch is
now 0.999998; the worst-token cos across our test prompts is 0.9987.
Engine filename: t5gemma_bf16.trt → t5gemma_fp16mixed.trt.
Engine size: 564 MB → 623 MB (+10%).
Latency: 0.78 ms → 0.91 ms (negligible vs DiT's 50+ ms).
ONNX: re-exported with mixed dtypes; STRONGLY_TYPED build respects them.
Consumer code (sa3_trt.py, install.sh, build.py, build_from_onnx.py) updated
in the github repo to point at the new filename.
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
-
size
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:dc79684b14c9d5647bf3e7870f388837ac4b8d6708ea3ead82a37a0b39803084
|
| 3 |
+
size 620393530
|
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
-
size
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:1d33c35470e8cc44a919bd7b4a7959dc9adb9796457d8e7571b55a90b18046e3
|
| 3 |
+
size 622964388
|