BTA — Stage 4 Control B (audio-decorrelated cf-pairs)

3-seed falsification control: cf-pairs with audio drawn from a DIFFERENT transcript per pair member while transcript+Φ labels are preserved. 3666 shuffled pairs total. Tests whether the R1.8 gain requires correct (audio, transcript, Φ) alignment.

Pre-registered PASS gate: MLP-2 cohort ≤ 0.275 (= R0 cohort MLP-2 0.245 + 0.030). Result: 0.2033 ≤ 0.275 → PASS, gap to R1.8 cohort = -0.103.

Files: A_R1p8_shuffle_seed{1234,1235,1236}.pt (~357 MB each)

Reported metrics:

Seed Probe-K linear Probe-K MLP-2
1234 0.1974 0.1905
1235 0.2265 0.2079
1236 0.2009 0.2114
mean ± σ 0.2083 ± 0.0159 0.2033 ± 0.0112

When audio is decorrelated from labels, the linear probe falls to 0.208 (below the R0 baseline of 0.211) — the adapter cannot encode Φ at all.

Code / paper: https://github.com/Nurgali-Kadyrbek/frozen-speech-llm-stress

License: CC-BY-NC-4.0.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for nur-dev/frozen-stress-stage4-controlb

Finetuned
Qwen/Qwen3-8B
Finetuned
(1571)
this model

Collection including nur-dev/frozen-stress-stage4-controlb