Subtitle-Aligned Fine-Tuning of Whisper for Swiss German ASR

Benchmark Contamination, Convention Mismatch, and an Honest Baseline at 25.6% WER (13.8% cWER)

Paper

📄 arXiv: https://arxiv.org/abs/2606.07608

Abstract

We fine-tune OpenAI's Whisper large-v3 on 1,367 hours of Swiss German broadcast subtitles, producing the first openly available, honestly evaluated Swiss German ASR models. We compare LoRA and full fine-tuning, finding comparable performance (cWER 13.8–13.9%) when LoRA scaling is properly configured. Our harmonized WER analysis reveals that nearly half of the measured 25.6% WER originates from semantically correct outputs penalized for stylistic differences. We further document systematic benchmark contamination in previously published Swiss German ASR systems and propose content WER (cWER) as a fairer evaluation metric for dialect-to-standard transcription.

Models

🤗 Flix-AI/flix-swissgerman-lora — LoRA adapter (Apache 2.0)
🤗 Flix-AI/flix-swissgerman-full — Full fine-tuned model (Apache 2.0)

Citation

@misc{akeret2026swissgerman,
  title={Subtitle-Aligned Fine-Tuning of Whisper for Swiss German ASR: Benchmark Contamination, Convention Mismatch, and an Honest Baseline at 25.6\% WER (13.8\% cWER)},
  author={Akeret, Felix},
  year={2026},
  url={https://arxiv.org/abs/2606.07608},
  eprint={2606.07608},
  archivePrefix={arXiv},
  primaryClass={cs.CL}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for Flix-AI/swissgerman-whisper-paper

Subtitle-Aligned Fine-Tuning of Whisper for Swiss German ASR: Benchmark Contamination, Convention Mismatch, and an Honest Baseline at 25.6% WER (13.8% cWER)

Paper • 2606.07608 • Published May 29