Subtitle-Aligned Fine-Tuning of Whisper for Swiss German ASR

Benchmark Contamination, Convention Mismatch, and an Honest Baseline at 25.6% WER (13.8% cWER)

Paper

πŸ“„ Download PDF

Abstract

We fine-tune OpenAI's Whisper large-v3 on 1,367 hours of Swiss German broadcast subtitles, producing the first openly available, honestly evaluated Swiss German ASR models. We compare LoRA and full fine-tuning, finding comparable performance (cWER 13.8–13.9%) when LoRA scaling is properly configured. Our harmonized WER analysis reveals that nearly half of the measured 25.6% WER originates from semantically correct outputs penalized for stylistic differences. We further document systematic benchmark contamination in previously published Swiss German ASR systems and propose content WER (cWER) as a fairer evaluation metric for dialect-to-standard transcription.

Models

Citation

@misc{akeret2026swissgerman,
  title={Subtitle-Aligned Fine-Tuning of Whisper for Swiss German ASR: Benchmark Contamination, Convention Mismatch, and an Honest Baseline at 25.6\% WER (13.8\% cWER)},
  author={Akeret, Felix},
  year={2026},
  url={https://huggingface.co/Flix-AI/swissgerman-whisper-paper}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support