i4ds/spc_r
Viewer • Updated • 13.6k • 533 • 6
How to use Flix-AI/flix-swiss-german-full with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("automatic-speech-recognition", model="Flix-AI/flix-swiss-german-full") # Load model directly
from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq
processor = AutoProcessor.from_pretrained("Flix-AI/flix-swiss-german-full")
model = AutoModelForSpeechSeq2Seq.from_pretrained("Flix-AI/flix-swiss-german-full")A fine-tuned version of openai/whisper-large-v3 for Swiss German (Schweizerdeutsch) automatic speech recognition. The model transcribes Swiss German dialect speech into grammatically correct Standard German text.
This is the first publicly available, fully fine-tuned Whisper model for Swiss German.
| Metric | Value | Notes |
|---|---|---|
| WER (measured) | 25.60% | ASGDTS, 5,750 samples, honest evaluation |
| cWER (content errors only) | 13.8% | Excludes style/convention differences |
| sWER (style component) | 11.3% | Valid alternative translations penalized by WER |
| bWER (bias-corrected) | 8.5% | Estimated true error rate |
| Whisper large-v3 baseline | 28.56% | Zero-shot, no fine-tuning |
Our WER of 25.60% should be interpreted carefully:
from transformers import WhisperForConditionalGeneration, WhisperProcessor
import torch
model_id = "Flix-AI/flix-swiss-german-full"
processor = WhisperProcessor.from_pretrained(model_id)
model = WhisperForConditionalGeneration.from_pretrained(
model_id, torch_dtype=torch.bfloat16, device_map="auto"
)
# Transcribe Swiss German audio
audio_array = ... # numpy array, 16kHz mono
input_features = processor(
audio_array, sampling_rate=16000, return_tensors="pt"
).input_features.to(model.device, dtype=torch.bfloat16)
predicted_ids = model.generate(input_features, language="de", task="transcribe")
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
print(transcription)
| Source | Hours | License | Content |
|---|---|---|---|
| SRF Mediathek | 848h | Research use (Art. 24d URG) | Broadcast subtitles (news, entertainment, documentary) |
| Swiss Parliament (SPC v2) | 202h | CC BY 4.0 | Parliamentary speeches (Grosser Rat BE) |
| YouTube | 151h | Research use (Art. 24d URG) | 25 institutional channels (cantons, police, podcasts) |
| PlaySuisse | 165h | Research use (Art. 24d URG) | Swiss films and series |
| Total | 1,367h |
No training data is redistributed with this model. The model was trained under the Swiss text and data mining research exception (Art. 24d URG).
| Parameter | Value |
|---|---|
| Trainable parameters | 1,543,490,560 (100%) |
| Optimizer | AdamW |
| Learning rate | 1×10⁻⁵ (cosine decay) |
| Warmup steps | 500 |
| Effective batch size | 32 |
| Precision | bfloat16 |
| Gradient checkpointing | Enabled |
| SpecAugment | Enabled |
| Training time | ~73 hours (2 epochs) |
The training data covers all major Swiss German dialect regions:
| Dialect | Primary Source |
|---|---|
| Züridütsch | SRF, YouTube |
| Berndeutsch | SPC v2 (dominant), SRF |
| Luzernerdeutsch | SRF, YouTube |
| Baseldeutsch | SRF, YouTube |
| St. Gallerdeutsch | SRF, YouTube |
| Walliserdeutsch | SRF, PlaySuisse |
| Bündnerdeutsch | YouTube |
| Appenzellerdeutsch | SRF |
@article{akeret2026whisper-swiss-german,
title={Subtitle-Aligned Fine-Tuning of Whisper for Swiss German ASR: Benchmark Contamination, Convention Mismatch, and an Honest Baseline at 25.6\% WER (13.8\% cWER)},
author={Akeret, Felix},
year={2026},
url={https://huggingface.co/Flix-AI/flix-swiss-german-full}
}