BTA โ€” Stage 7 styled-teacher (BRIDGING-NULL pilot)

Single-axis change from the R1.8 cohort: the BLSP teacher's text input becomes transcript + " [stress on word: <word>]" on counterfactual-pair members and stress-individual rows. LibriSpeech and Expresso (no stress label) keep plain transcript. Everything else bit-identical to R1.8.

This is the frozen-encoder ablation of the SpeechEmotionLlama \citep{kang2025frozenllm} alignment-target intervention. To our knowledge no such ablation exists in their published work.

Outcome branch: BRIDGING-NULL. Probe-G$_{\mathrm{neutral}}$ = 0.5158, exactly inside the R0 NULL band [0.4972, 0.5272]. Oracle re-confirm 0.7871 (zero drift). Probe-K MLP-2 cohort = 0.3066 (still > K_T 0.290). C.1 spread 0.021 (tightest yet); C.2 ratio 0.970 (strongest yet).

The stage's reference baselines are decisive:

  • $K_T^{\mathrm{styled}}$ neutral = 0.7901 (gold transcript + tag in audio slot)
  • Cascade-T+L neutral = 0.8020 (Whisper transcript + tag in audio slot)
  • Stage 7 audio adapter neutral = 0.5158
  • Distillation gap: 0.7901 โˆ’ 0.5158 = 0.2743 absolute

The frozen Qwen3-8B is capable of consuming the styled paralinguistic tag at 0.79+ in text form. The trained audio adapter under L_KL distillation cannot elicit the equivalent tag-conditioned response from continuous audio prefixes under the frozen-encoder regime.

Files:

  • A_R1p8_styled_seed1234.pt (357 MB)
  • grad_norm_clip_log_seed1234.json

Code / paper: https://github.com/Nurgali-Kadyrbek/frozen-speech-llm-stress

License: CC-BY-NC-4.0.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for nur-dev/frozen-stress-styled-teacher

Finetuned
Qwen/Qwen3-8B
Finetuned
(1566)
this model

Collection including nur-dev/frozen-stress-styled-teacher