Beyond Transcript Alignment
Collection
Frozen-frozen speech-LLM adapters + counterfactual training. Code: github.com/Nurgali-Kadyrbek/frozen-speech-llm-stress • 8 items • Updated
3-seed falsification control: cf-pairs with audio drawn from a DIFFERENT transcript per pair member while transcript+Φ labels are preserved. 3666 shuffled pairs total. Tests whether the R1.8 gain requires correct (audio, transcript, Φ) alignment.
Pre-registered PASS gate: MLP-2 cohort ≤ 0.275 (= R0 cohort MLP-2 0.245 + 0.030). Result: 0.2033 ≤ 0.275 → PASS, gap to R1.8 cohort = -0.103.
Files: A_R1p8_shuffle_seed{1234,1235,1236}.pt (~357 MB each)
Reported metrics:
| Seed | Probe-K linear | Probe-K MLP-2 |
|---|---|---|
| 1234 | 0.1974 | 0.1905 |
| 1235 | 0.2265 | 0.2079 |
| 1236 | 0.2009 | 0.2114 |
| mean ± σ | 0.2083 ± 0.0159 | 0.2033 ± 0.0112 |
When audio is decorrelated from labels, the linear probe falls to 0.208 (below the R0 baseline of 0.211) — the adapter cannot encode Φ at all.
Code / paper: https://github.com/Nurgali-Kadyrbek/frozen-speech-llm-stress
License: CC-BY-NC-4.0.