BTA โ Stage 7 styled-teacher (BRIDGING-NULL pilot)
Single-axis change from the R1.8 cohort: the BLSP teacher's text input
becomes transcript + " [stress on word: <word>]" on
counterfactual-pair members and stress-individual rows. LibriSpeech
and Expresso (no stress label) keep plain transcript. Everything else
bit-identical to R1.8.
This is the frozen-encoder ablation of the SpeechEmotionLlama \citep{kang2025frozenllm} alignment-target intervention. To our knowledge no such ablation exists in their published work.
Outcome branch: BRIDGING-NULL. Probe-G$_{\mathrm{neutral}}$ = 0.5158, exactly inside the R0 NULL band [0.4972, 0.5272]. Oracle re-confirm 0.7871 (zero drift). Probe-K MLP-2 cohort = 0.3066 (still > K_T 0.290). C.1 spread 0.021 (tightest yet); C.2 ratio 0.970 (strongest yet).
The stage's reference baselines are decisive:
- $K_T^{\mathrm{styled}}$ neutral = 0.7901 (gold transcript + tag in audio slot)
- Cascade-T+L neutral = 0.8020 (Whisper transcript + tag in audio slot)
- Stage 7 audio adapter neutral = 0.5158
- Distillation gap: 0.7901 โ 0.5158 = 0.2743 absolute
The frozen Qwen3-8B is capable of consuming the styled paralinguistic tag at 0.79+ in text form. The trained audio adapter under L_KL distillation cannot elicit the equivalent tag-conditioned response from continuous audio prefixes under the frozen-encoder regime.
Files:
A_R1p8_styled_seed1234.pt(357 MB)grad_norm_clip_log_seed1234.json
Code / paper: https://github.com/Nurgali-Kadyrbek/frozen-speech-llm-stress
License: CC-BY-NC-4.0.