BTA — Stage 4 Control B (audio-decorrelated cf-pairs)

3-seed falsification control: cf-pairs with audio drawn from a DIFFERENT transcript per pair member while transcript+Φ labels are preserved. 3666 shuffled pairs total. Tests whether the R1.8 gain requires correct (audio, transcript, Φ) alignment.

Pre-registered PASS gate: MLP-2 cohort ≤ 0.275 (= R0 cohort MLP-2 0.245 + 0.030). Result: 0.2033 ≤ 0.275 → PASS, gap to R1.8 cohort = -0.103.

Files: A_R1p8_shuffle_seed{1234,1235,1236}.pt (~357 MB each)

Reported metrics:

Seed	Probe-K linear	Probe-K MLP-2
1234	0.1974	0.1905
1235	0.2265	0.2079
1236	0.2009	0.2114
mean ± σ	0.2083 ± 0.0159	0.2033 ± 0.0112

When audio is decorrelated from labels, the linear probe falls to 0.208 (below the R0 baseline of 0.211) — the adapter cannot encode Φ at all.

Code / paper: https://github.com/Nurgali-Kadyrbek/frozen-speech-llm-stress

License: CC-BY-NC-4.0.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for nur-dev/frozen-stress-stage4-controlb

Base model

Qwen/Qwen3-8B-Base

Finetuned

Qwen/Qwen3-8B

Finetuned

(1776)

this model

Collection including nur-dev/frozen-stress-stage4-controlb

Beyond Transcript Alignment

Collection

Frozen-frozen speech-LLM adapters + counterfactual training. Code: github.com/Nurgali-Kadyrbek/frozen-speech-llm-stress • 8 items • Updated May 3