Akshat commited on
Commit
23d04f6
·
verified ·
1 Parent(s): d9b19fc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +67 -37
README.md CHANGED
@@ -27,67 +27,97 @@ Prompts were selected to intentionally stress known diffusion and caching weakne
27
 
28
  # Test Prompts & Rationale
29
 
30
- 1️⃣ Rhythmic "Jitter" Test
31
 
32
- Prompt:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33
 
34
- "140 BPM sharp techno drum loop, heavy kick, crisp hi-hats, 44.1kHz."
35
- Why:
36
- Stable Audio Open uses a timing-conditioned latent diffusion model to handle variable-length audio. Identity coefficients may fail to capture transient timing precisely when skipping steps, potentially causing rhythmic jitter or "beat drift".
37
- Evaluation Criteria:
38
 
39
- BPM accuracy
40
 
41
- Timing jitter
42
 
43
- Peak consistency
 
44
 
45
- High-frequency crispness
46
 
47
- 2️⃣ High-Frequency Clarity (Transient Test)
 
 
48
 
49
- Prompt:
50
 
51
- "Glass shattering on a concrete floor, high-pitched shards, long reverb tail."
52
- Why:
53
- The model targets high-fidelity 44.1kHz output with energy in high frequencies. Identity coefficients may over-smooth these peaky regions, turning sharp "shards" into generic white-noise "whooshes."
54
- Evaluation Criteria:
55
 
56
- Transient detection count
57
 
58
- Peak width (sharpness)
59
 
60
- Spectral high-frequency ratio
61
 
62
- 3️⃣ Ghosting & Artifact (Drone Test)
 
63
 
64
- Prompt:
 
 
65
 
66
- "Ambient drone, slow evolving cinematic pads, deep bass, mystical atmosphere."
67
- Why:
68
- TeaCache reuses transformer residuals from previous steps to provide a 1.5x-2.0x speedup. Slow-moving signals may suffer from "ghosting" or rhythmic "wobbles" if the cache is updated too aggressively.
69
- Evaluation Criteria:
70
 
71
- Amplitude variance (Pulsing)
72
 
73
- Spectral flatness
 
 
 
 
74
 
75
- High-frequency noise
76
 
77
- 4️⃣ Complex Layering Test
78
 
79
- Prompt:
80
 
81
- "A man speaking in a crowded cafe with clinking plates and background jazz."
82
- Why:
83
- Stable Audio Open specifically finds it challenging to generate prompts with "connectors" and may omit sounds or merge layers. This tests if caching exacerbates the "muddying" of foreground speech against background noise.
84
- Evaluation Criteria:
85
 
86
- Stereo width
 
87
 
88
- Speech presence band (300Hz–3kHz)
 
 
 
 
 
 
 
 
 
 
89
 
90
- Dynamic variance
91
 
92
  ---
93
 
 
27
 
28
  # Test Prompts & Rationale
29
 
30
+ The following prompts were intentionally designed to stress different structural and spectral weaknesses under TeaCache acceleration.
31
 
32
+ ---
33
+
34
+ ## 1️⃣ Rhythmic "Jitter" Test
35
+
36
+ **Prompt:**
37
+
38
+ > "140 BPM sharp techno drum loop, heavy kick, crisp hi-hats, 44.1kHz."
39
+
40
+ **Why:**
41
+
42
+ Stable Audio Open uses a timing-conditioned latent diffusion model to generate variable-length audio.
43
+ When TeaCache skips updates, Identity coefficients may fail to preserve transient alignment precisely, potentially introducing rhythmic jitter or subtle beat drift.
44
+
45
+ **Evaluation Criteria:**
46
+
47
+ - BPM accuracy
48
+ - Timing jitter (ms)
49
+ - Peak consistency (kick uniformity)
50
+ - High-frequency crispness
51
+
52
+ ---
53
+
54
+ ## 2️⃣ High-Frequency Clarity (Transient Test)
55
 
56
+ **Prompt:**
 
 
 
57
 
58
+ > "Glass shattering on a concrete floor, high-pitched shards, long reverb tail."
59
 
60
+ **Why:**
61
 
62
+ The model targets high-fidelity 44.1kHz stereo synthesis, where high-frequency energy is critical.
63
+ Identity coefficients may over-smooth sharp spectral peaks, turning crisp impacts into broadband noise-like "whooshes."
64
 
65
+ **Evaluation Criteria:**
66
 
67
+ - Transient detection count
68
+ - Average peak width (sharpness)
69
+ - Spectral high-frequency ratio
70
 
71
+ ---
72
 
73
+ ## 3️⃣ Ghosting & Artifact (Drone Test)
 
 
 
74
 
75
+ **Prompt:**
76
 
77
+ > "Ambient drone, slow evolving cinematic pads, deep bass, mystical atmosphere."
78
 
79
+ **Why:**
80
 
81
+ TeaCache reuses transformer residuals from previous timesteps to achieve ~1.5×–2.0× speedup.
82
+ Slow-moving, low-variation signals may suffer from:
83
 
84
+ - Residual ghosting
85
+ - Rhythmic wobble
86
+ - Amplitude pumping
87
 
88
+ if the cache update threshold is too aggressive.
 
 
 
89
 
90
+ **Evaluation Criteria:**
91
 
92
+ - Amplitude variance (pulsing detection)
93
+ - Spectral flatness
94
+ - High-frequency noise ratio
95
+
96
+ ---
97
 
98
+ ## 4️⃣ Complex Layering Test
99
 
100
+ **Prompt:**
101
 
102
+ > "A man speaking in a crowded cafe with clinking plates and background jazz."
103
 
104
+ **Why:**
 
 
 
105
 
106
+ Stable Audio Open is known to struggle with prompts involving connectors (multiple simultaneous sound sources).
107
+ This test evaluates whether TeaCache acceleration exacerbates:
108
 
109
+ - Foreground/background blending
110
+ - Stereo image collapse
111
+ - Layer loss
112
+
113
+ **Evaluation Criteria:**
114
+
115
+ - Stereo width (correlation)
116
+ - Speech presence band (300Hz–3kHz energy)
117
+ - Dynamic variance (layer separation clarity)
118
+
119
+ ---
120
 
 
121
 
122
  ---
123