Whisper Small HSA Merge: AMI + Script 0-2

This repository contains a minimal Hugging Face Transformers checkpoint for the manuscript Compositional Domain Adaptation for Automatic Speech Recognition with Headwise Selective Attention Merging.

Model Details

  • Model type: Whisper sequence-to-sequence ASR model
  • Base model: openai/whisper-small
  • Release group: Cross-corpus transfer
  • Checkpoint kind: Headwise Selective Attention (HSA) merged checkpoint
  • Manuscript role: Adult conversational cross-corpus merge
  • Source artifact: 05_cross_corpus_small/ami_hsa_merge

Method Context

This is a structured model merge for compositional domain adaptation. It composes task-specific adaptations without additional retraining by restricting parameter arithmetic to salient attention heads where the adaptations are concentrated.

Training/adaptation context: Cross-corpus composition: AMI adult conversational speech and scripted 0-2 child speech adaptations.

The broader manuscript studies whether speech foundation model adaptations for different distribution shifts, such as acoustic condition, speaking style, speaker population, and dialect, can be recombined for low-resource and intersectional ASR without direct joint-supervision data.

Intended Use

Use this checkpoint to reproduce or extend the paper's ASR model-merging experiments. It is intended for research on child ASR, compositional domain adaptation, robustness, cross-corpus transfer, dialectal variation, and scaling behavior across Whisper model sizes.

How To Load

from transformers import WhisperForConditionalGeneration, WhisperProcessor

model_id = "balaji1312/whisper_small_hsa_ami_script_0_2"
processor = WhisperProcessor.from_pretrained(model_id)
model = WhisperForConditionalGeneration.from_pretrained(model_id)

For local use before upload:

from pathlib import Path
from transformers import WhisperForConditionalGeneration, WhisperProcessor

model_dir = Path("final_release_models") / "05_cross_corpus_small" / "whisper_small_hsa_ami_script_0_2"
processor = WhisperProcessor.from_pretrained(model_dir)
model = WhisperForConditionalGeneration.from_pretrained(model_dir)

Release Files

This model card was generated for the curated release tree. The model-loading payload consists of:

config.json, generation_config.json, preprocessor_config.json, tokenizer_config.json, vocab.json, merges.txt, normalizer.json, special_tokens_map.json, added_tokens.json, model.safetensors

Training state, optimizer state, decode logs, hypotheses, references, and intermediate experiment outputs were intentionally omitted.

Limitations

The checkpoint is released for research reproducibility. Results outside the paper's child ASR, robustness, cross-corpus, dialectal, and scaling-law settings are not characterized here. Reproducing WER numbers requires the manuscript evaluation pipeline and authorized access to the relevant speech corpora; no evaluation audio or transcripts are redistributed in this model folder.

Citation

If you use this checkpoint, please cite the manuscript:

@article{shankara2026compositional,
  title = {Compositional Domain Adaptation for Automatic Speech Recognition with Headwise Selective Attention Merging},
  author = {Shankara, Natarajan Balaji and Wang, Zilai and Eren, Eray and Alwan, Abeer},
  year = {2026},
  note = {Manuscript submitted to Computer Speech & Language}
}
Downloads last month
14
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for balaji1312/whisper_small_hsa_ami_script_0_2

Finetuned
(3549)
this model