Identity vs TangoFlux Coefficients with TeaCache Acceleration
Stable Audio Open 1.0 Evaluation (Tesla T4)
Abstract
This experiment evaluates Identity vs TangoFlux coefficients under TeaCache acceleration for Stable Audio Open 1.0.
Initial benchmarks showed that Identity provided better raw speedup.
However, to investigate quality trade-offs, we conducted structured stress tests across rhythmic, spectral, temporal, and multi-layer scenarios.
Prompts were selected to intentionally stress known diffusion and caching weaknesses.
Testing is done via feeding the output to gemini3(thinking) and then asking it to rate the generation!
Test Prompts & Rationale
The following prompts were intentionally designed to stress different structural and spectral weaknesses under TeaCache acceleration.
1️⃣ Rhythmic "Jitter" Test
Prompt:
"140 BPM sharp techno drum loop, heavy kick, crisp hi-hats, 44.1kHz."
Why:
Stable Audio Open uses a timing-conditioned latent diffusion model to generate variable-length audio.
When TeaCache skips updates, Identity coefficients may fail to preserve transient alignment precisely, potentially introducing rhythmic jitter or subtle beat drift.
Evaluation Criteria:
- BPM accuracy
- Timing jitter (ms)
- Peak consistency (kick uniformity)
- High-frequency crispness
2️⃣ High-Frequency Clarity (Transient Test)
Prompt:
"Glass shattering on a concrete floor, high-pitched shards, long reverb tail."
Why:
The model targets high-fidelity 44.1kHz stereo synthesis, where high-frequency energy is critical.
Identity coefficients may over-smooth sharp spectral peaks, turning crisp impacts into broadband noise-like "whooshes."
Evaluation Criteria:
- Transient detection count
- Average peak width (sharpness)
- Spectral high-frequency ratio
3️⃣ Ghosting & Artifact (Drone Test)
Prompt:
"Ambient drone, slow evolving cinematic pads, deep bass, mystical atmosphere."
Why:
TeaCache reuses transformer residuals from previous timesteps to achieve ~1.5×–2.0× speedup.
Slow-moving, low-variation signals may suffer from:
- Residual ghosting
- Rhythmic wobble
- Amplitude pumping
if the cache update threshold is too aggressive.
Evaluation Criteria:
- Amplitude variance (pulsing detection)
- Spectral flatness
- High-frequency noise ratio
4️⃣ Complex Layering Test
Prompt:
"A man speaking in a crowded cafe with clinking plates and background jazz."
Why:
Stable Audio Open is known to struggle with prompts involving connectors (multiple simultaneous sound sources).
This test evaluates whether TeaCache acceleration exacerbates:
- Foreground/background blending
- Stereo image collapse
- Layer loss
Evaluation Criteria:
- Stereo width (correlation)
- Speech presence band (300Hz–3kHz energy)
- Dynamic variance (layer separation clarity)
Hardware & Configuration
- GPU: Tesla T4 16GB
- Total Steps: 50
- Max Audio Length: 10 seconds
- Precision: float16
- TeaCache r1_threshold values tested:
0.2(Stable / Conservative)0.6(Aggressive / Stress Test)
Results
Rhythm Test Results
| File | BPM | Timing Jitter | Peak Consistency ↓ | HF Crispness | Rating |
|---|---|---|---|---|---|
| rhythm_02_identity | 140.0 | 21.2 ms | 1066 | 62.8% | 7/10 |
| rhythm_02_tangoflux | 140.5 | 23.1 ms | 266 | 37.4% | 9/10 |
| rhythm_06_identity | 140.1 | 6.9 ms | 1164 | 53.1% | 6/10 |
| rhythm_06_tangoflux | 143.7 | 40.1 ms | 3663 | 66.1% | 3/10 |
Analysis
Threshold 0.2
Identity (7/10)
On-grid tempo but high peak variance → inconsistent kick dynamics.TangoFlux (9/10)
4× better peak consistency → uniform professional thump.
Slight BPM drift but stronger rhythmic lock.
Threshold 0.6 (Stress)
Identity (6/10)
Stable timing but reduced crispness → muffled sound.TangoFlux (3/10)
Severe instability: extra beats, 40ms jitter, massive variance.
Audible rhythmic breakdown.
Rhythm Summary
TangoFlux excels at low thresholds.
Identity is more robust under aggressive caching.
Transient Test Results
| File | Transients | Avg Peak Width ↓ | HF Ratio | Rating |
|---|---|---|---|---|
| transient_02_identity | 7 | 0.100 ms | 37.5% | 5/10 |
| transient_02_tangoflux | 3 | 0.076 ms | 42.9% | 8/10 |
| transient_06_identity | 3 | 0.060 ms | 42.8% | 7/10 |
| transient_06_tangoflux | 4 | 0.057 ms | 53.4% | 9/10 |
Analysis
Threshold 0.2
- Identity: More peaks but blurred.
- TangoFlux: Sharper and brighter impacts.
Threshold 0.6
- Identity: Clean but simplified.
- TangoFlux: Best sharpness and HF preservation.
Transient Summary
TangoFlux clearly superior for high-frequency bite and sharp impacts.
Layering Test Results
| File | Stereo Width ↓ | Speech Presence | Dynamic Variance | Rating |
|---|---|---|---|---|
| layering_02_identity | 0.78 | 71.9% | 0.081 | 8/10 |
| layering_02_tangoflux | 0.69 | 51.3% | 0.040 | 6/10 |
| layering_06_identity | 0.54 | 65.2% | 0.060 | 7/10 |
| layering_06_tangoflux | 1.00 (Mono) | 40.1% | 0.031 | 1/10 |
Analysis
Threshold 0.2
- Identity preserves voice clarity and depth.
- TangoFlux compresses layers.
Threshold 0.6
- Identity fails gracefully.
- TangoFlux collapses to mono and loses subject separation.
Layering Summary
Identity is significantly more reliable for complex environments.
🌊 Drone / Temporal Test Results
| File | Amplitude Variance ↓ | Spectral Flatness | HF Noise | Rating |
|---|---|---|---|---|
| drone_02_identity | 0.038 | 0.081 | 14.2% | 9/10 |
| drone_02_tangoflux | 0.045 | 0.076 | 13.5% | 8/10 |
| drone_06_identity | 0.052 | 0.104 | 16.1% | 7/10 |
| drone_06_tangoflux | 0.071 | 0.092 | 22.4% | 4/10 |
Analysis
Threshold 0.2
Identity produces the most stable amplitude profile.
Threshold 0.6
TangoFlux introduces noticeable wobble and HF artifacts.
Drone Summary
Identity is superior for atmospheric and evolving pads.
Final Verdict (Tesla T4 Context)
| Use Case | Recommended Coefficient |
|---|---|
| Sharp transients (glass, snares, impacts) | TangoFlux (r1=0.2) |
| Drum loops at conservative thresholds | TangoFlux (r1=0.2) |
| Multi-layered environments | Identity |
| Drones & atmospheric audio | Identity |
| Aggressive caching (r1=0.6+) | Identity (more stable) |
Overall Rating: 8.5 / 10
The experiment demonstrates a clear trade-off:
- TangoFlux preserves spectral sharpness and transient attack.
- Identity provides better stability and graceful degradation under high caching thresholds.
On a Tesla T4 (limited compute), Identity offers a superior speed-to-quality stability balance for complex audio scenes.
Coefficient Comparison Summary
| Feature | Identity | TangoFlux |
|---|---|---|
| Transient Response | Blurs sharp sounds | Preserves HF detail |
| Rhythmic Stability | Stable but dynamic variance | Excellent at low threshold, unstable high |
| Layer Separation | Strong foreground/background separation | Risk of muddiness or mono collapse |
| Temporal Flow | Minimal pulsing | Prone to wobble artifacts |
Conclusion
TangoFlux is best for high-frequency, sharp, single-subject sounds.
Identity is more reliable for:
- Multi-layered environments
- Atmospheric content
- High-threshold TeaCache usage
- T4-constrained deployments
This experiment highlights how coefficient selection directly impacts structural vs spectral fidelity in accelerated diffusion audio pipelines.
Model tree for Akshat/TeaCache-flux-identity-StableAudioOpen
Base model
stabilityai/stable-audio-open-1.0