Subtitle-Aligned Fine-Tuning of Whisper for Swiss German ASR
Benchmark Contamination, Convention Mismatch, and an Honest Baseline at 25.6% WER (13.8% cWER)
Paper
π Download PDF
Abstract
We fine-tune OpenAI's Whisper large-v3 on 1,367 hours of Swiss German broadcast subtitles, producing the first openly available, honestly evaluated Swiss German ASR models. We compare LoRA and full fine-tuning, finding comparable performance (cWER 13.8β13.9%) when LoRA scaling is properly configured. Our harmonized WER analysis reveals that nearly half of the measured 25.6% WER originates from semantically correct outputs penalized for stylistic differences. We further document systematic benchmark contamination in previously published Swiss German ASR systems and propose content WER (cWER) as a fairer evaluation metric for dialect-to-standard transcription.
Models
- π€ Flix-AI/flix-swissgerman-lora β LoRA adapter (Apache 2.0)
- π€ Flix-AI/flix-swissgerman-full β Full fine-tuned model (Apache 2.0)
Citation
@misc{akeret2026swissgerman,
title={Subtitle-Aligned Fine-Tuning of Whisper for Swiss German ASR: Benchmark Contamination, Convention Mismatch, and an Honest Baseline at 25.6\% WER (13.8\% cWER)},
author={Akeret, Felix},
year={2026},
url={https://huggingface.co/Flix-AI/swissgerman-whisper-paper}
}