i4ds/spc_r
Viewer • Updated • 13.6k • 533 • 6
How to use Flix-AI/flix-swiss-german-lora with PEFT:
from peft import PeftModel
from transformers import AutoModelForSeq2SeqLM
base_model = AutoModelForSeq2SeqLM.from_pretrained("openai/whisper-large-v3")
model = PeftModel.from_pretrained(base_model, "Flix-AI/flix-swiss-german-lora")A LoRA adapter for openai/whisper-large-v3 fine-tuned for Swiss German (Schweizerdeutsch) automatic speech recognition. The adapter transcribes Swiss German dialect speech into grammatically correct Standard German text.
This is among the first publicly available, honestly evaluated LoRA adapters for Swiss German ASR.
| Metric | Value | Notes |
|---|---|---|
| WER (measured) | 25.32% | ASGDTS, 200 samples (seed=42), honest evaluation |
| cWER (content errors only) | 13.9% | Excludes style/convention differences |
| sWER (style component) | 11.3% | Valid alternative translations penalized by WER |
| bWER (bias-corrected) | 8.5% | Estimated true error rate |
| Whisper large-v3 baseline | 28.56% | Zero-shot, no fine-tuning |
Our WER of 25.32% should be interpreted carefully:
from transformers import WhisperForConditionalGeneration, WhisperProcessor
from peft import PeftModel
import torch
base_model_id = "openai/whisper-large-v3"
adapter_id = "Flix-AI/flix-swiss-german-lora"
processor = WhisperProcessor.from_pretrained(base_model_id)
model = WhisperForConditionalGeneration.from_pretrained(
base_model_id, torch_dtype=torch.float32, device_map="auto"
)
model = PeftModel.from_pretrained(model, adapter_id)
# Transcribe Swiss German audio
audio_array = ... # numpy array, 16kHz mono
input_features = processor(
audio_array, sampling_rate=16000, return_tensors="pt"
).input_features.to(model.device)
predicted_ids = model.generate(input_features, language="de", task="transcribe")
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
print(transcription)
| Parameter | Value |
|---|---|
| Rank (r) | 160 |
| Alpha (α) | 32 |
| Dropout | 0.05 |
| Target modules | q_proj, k_proj, v_proj, out_proj, fc1, fc2 |
| Task type | SEQ_2_SEQ_LM |
| PEFT version | 0.18.1 |
| Source | Hours | License | Content |
|---|---|---|---|
| SRF Mediathek | 690h | Research use (Art. 24d URG) | Broadcast subtitles (news, entertainment, documentary) |
| Swiss Parliament (SPC v2) | 202h | CC BY 4.0 | Parliamentary speeches (Grosser Rat BE) |
| YouTube | 151h | Research use (Art. 24d URG) | 25 institutional channels (cantons, police, podcasts) |
| PlaySuisse | 49h | Research use (Art. 24d URG) | Swiss films and series |
| Total | 1,092h |
No training data is redistributed with this model. The model was trained under the Swiss text and data mining research exception (Art. 24d URG).
| Parameter | Value |
|---|---|
| Optimizer | AdamW |
| Learning rate | 2×10⁻⁴ (cosine decay) |
| Warmup steps | 500 |
| Effective batch size | 32 |
| Precision | float32 |
| SpecAugment | Enabled |
| Training time | ~60 hours |
The training data covers all major Swiss German dialect regions:
| Dialect | Primary Source |
|---|---|
| Züridütsch | SRF, YouTube |
| Berndeutsch | SPC v2 (dominant), SRF |
| Luzernerdeutsch | SRF, YouTube |
| Baseldeutsch | SRF, YouTube |
| St. Gallerdeutsch | SRF, YouTube |
| Walliserdeutsch | SRF, PlaySuisse |
| Bündnerdeutsch | YouTube |
| Appenzellerdeutsch | SRF |
@article{akeret2026whisper-swiss-german,
title={Subtitle-Aligned Fine-Tuning of Whisper for Swiss German ASR: Benchmark Contamination, Convention Mismatch, and an Honest Baseline at 25.6\% WER (13.8\% cWER)},
author={Akeret, Felix},
year={2026},
url={https://huggingface.co/Flix-AI/flix-swiss-german-lora}
}
Base model
openai/whisper-large-v3