Instructions to use balaji1312/whisper_medium_sft_ogi_script_0_2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use balaji1312/whisper_medium_sft_ogi_script_0_2 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="balaji1312/whisper_medium_sft_ogi_script_0_2")# Load model directly from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq processor = AutoProcessor.from_pretrained("balaji1312/whisper_medium_sft_ogi_script_0_2") model = AutoModelForSpeechSeq2Seq.from_pretrained("balaji1312/whisper_medium_sft_ogi_script_0_2") - Notebooks
- Google Colab
- Kaggle
Whisper Medium SFT: OGI Script 0-2
This repository contains a minimal Hugging Face Transformers checkpoint for the manuscript Compositional Domain Adaptation for Automatic Speech Recognition with Headwise Selective Attention Merging.
Model Details
- Model type: Whisper sequence-to-sequence ASR model
- Base model:
openai/whisper-medium - Release group: Scaling-law checkpoints
- Checkpoint kind: Single-source supervised fine-tuned checkpoint
- Manuscript role: Scaling-law scripted baseline
- Source artifact:
07_scaling_laws/whisper_medium_train_scrip_bk1
Method Context
This is a single-source fine-tuned checkpoint. In the manuscript it serves as a source model, baseline, or task-vector endpoint for studying how distribution-shift factors can be recombined.
Training/adaptation context: OGI scripted child speech adaptation.
The broader manuscript studies whether speech foundation model adaptations for different distribution shifts, such as acoustic condition, speaking style, speaker population, and dialect, can be recombined for low-resource and intersectional ASR without direct joint-supervision data.
Intended Use
Use this checkpoint to reproduce or extend the paper's ASR model-merging experiments. It is intended for research on child ASR, compositional domain adaptation, robustness, cross-corpus transfer, dialectal variation, and scaling behavior across Whisper model sizes.
How To Load
from transformers import WhisperForConditionalGeneration, WhisperProcessor
model_id = "balaji1312/whisper_medium_sft_ogi_script_0_2"
processor = WhisperProcessor.from_pretrained(model_id)
model = WhisperForConditionalGeneration.from_pretrained(model_id)
For local use before upload:
from pathlib import Path
from transformers import WhisperForConditionalGeneration, WhisperProcessor
model_dir = Path("final_release_models") / "07_scaling_laws" / "whisper_medium_sft_ogi_script_0_2"
processor = WhisperProcessor.from_pretrained(model_dir)
model = WhisperForConditionalGeneration.from_pretrained(model_dir)
Release Files
This model card was generated for the curated release tree. The model-loading payload consists of:
config.json, generation_config.json, preprocessor_config.json, tokenizer_config.json, tokenizer.json, vocab.json, merges.txt, normalizer.json, special_tokens_map.json, added_tokens.json, model.safetensors
Training state, optimizer state, decode logs, hypotheses, references, and intermediate experiment outputs were intentionally omitted.
Limitations
The checkpoint is released for research reproducibility. Results outside the paper's child ASR, robustness, cross-corpus, dialectal, and scaling-law settings are not characterized here. Reproducing WER numbers requires the manuscript evaluation pipeline and authorized access to the relevant speech corpora; no evaluation audio or transcripts are redistributed in this model folder.
Citation
If you use this checkpoint, please cite the manuscript:
@article{shankara2026compositional,
title = {Compositional Domain Adaptation for Automatic Speech Recognition with Headwise Selective Attention Merging},
author = {Shankara, Natarajan Balaji and Wang, Zilai and Eren, Eray and Alwan, Abeer},
year = {2026},
note = {Manuscript submitted to Computer Speech & Language}
}
- Downloads last month
- 19
Model tree for balaji1312/whisper_medium_sft_ogi_script_0_2
Base model
openai/whisper-medium