Strip video_embeddings_connector + video_aggregate_embed (saves 4.77 GB)

Removed 131 dead tensors that audio-only inference never touches:
- model.diffusion_model.video_embeddings_connector.* (~3.2 GB, 8 transformer blocks at 4096-dim)
- text_embedding_projection.video_aggregate_embed.* (~1.5 GB)

Runtime audio_only=True path drops these before .to(device) and replaces video_aggregate_embed with a zero-returning dummy. Local A/B inference reproduces identical denoise/decode timing and output shape.

6.71 GB -> 1.94 GB.

Files changed (1) hide show

dramabox-audio-components.safetensors +2 -2

dramabox-audio-components.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:5fc3fd010c386ddce78f58ac43600cdbc921c50b5b0426ff08167863bfd419d7
-size 5167945180

 version https://git-lfs.github.com/spec/v1
+oid sha256:73d50dd3e913fd1d2511a09e4a2225f60f2ede43ef629764e6d4a389422bf7d1
+size 1942831020