BTA — Stage 7 styled-teacher (BRIDGING-NULL pilot)

Single-axis change from the R1.8 cohort: the BLSP teacher's text input becomes transcript + " [stress on word: <word>]" on counterfactual-pair members and stress-individual rows. LibriSpeech and Expresso (no stress label) keep plain transcript. Everything else bit-identical to R1.8.

This is the frozen-encoder ablation of the SpeechEmotionLlama \citep{kang2025frozenllm} alignment-target intervention. To our knowledge no such ablation exists in their published work.

Outcome branch: BRIDGING-NULL. Probe-G$_{\mathrm{neutral}}$ = 0.5158, exactly inside the R0 NULL band [0.4972, 0.5272]. Oracle re-confirm 0.7871 (zero drift). Probe-K MLP-2 cohort = 0.3066 (still > K_T 0.290). C.1 spread 0.021 (tightest yet); C.2 ratio 0.970 (strongest yet).

The stage's reference baselines are decisive:

$K_T^{\mathrm{styled}}$ neutral = 0.7901 (gold transcript + tag in audio slot)
Cascade-T+L neutral = 0.8020 (Whisper transcript + tag in audio slot)
Stage 7 audio adapter neutral = 0.5158
Distillation gap: 0.7901 − 0.5158 = 0.2743 absolute

The frozen Qwen3-8B is capable of consuming the styled paralinguistic tag at 0.79+ in text form. The trained audio adapter under L_KL distillation cannot elicit the equivalent tag-conditioned response from continuous audio prefixes under the frozen-encoder regime.

Files: