metadata
language:
- sk
tags:
- speech
- asr
- whisper
- slovak
- parliament
- legal
- politics
base_model: openai/whisper-large-v3
datasets:
- erikbozik/slovak-plenary-asr-corpus
metrics:
- wer
model-index:
- name: whisper-large-v3-sk
results:
- task:
type: automatic-speech-recognition
name: Automatic Speech Recognition
dataset:
name: Common Voice 21 (Slovak test set)
type: common_voice
metrics:
- name: WER
type: wer
value: 11.6
- task:
type: automatic-speech-recognition
name: Automatic Speech Recognition
dataset:
name: FLEURS (Slovak test set)
type: fleurs
metrics:
- name: WER
type: wer
value: 5.5
license: mit
Whisper Large-v3 — Fine-tuned on Slovak Plenary ASR Corpus
This model is a fine-tuned version of openai/whisper-large-v3.
It is adapted for Slovak ASR using SloPalSpeech: 2,806 hours of aligned, ≤30 s speech–text pairs from official plenary sessions of the Slovak National Council.
- Language: Slovak
- Domain: Parliamentary / formal speech
- Training data: 2,806 h
- Intended use: Slovak speech recognition; strongest in formal/public-speaking contexts
🧪 Evaluation
| Dataset | Base WER | Fine-tuned WER | Δ (abs) |
|---|---|---|---|
| Common Voice 21 (sk) | 20.8 | 11.6 | -9.2 |
| FLEURS (sk) | 9.2 | 5.5 | -3.7 |
Numbers from the paper’s final benchmark runs.
🔧 Training Details
- Framework: Hugging Face Transformers
- Hardware: Multi-GPU setup (NVIDIA A10s) with Fully Sharded Data Parallel (FSDP)
- Epochs: ~2 with early stopping on validation WER
- Learning rate:
1e-5with weight decay0.01to prevent overfitting - Notes: Training required sharded checkpoints; evaluation run separately due to runtime compatibility issues
⚠️ Limitations
- Domain bias toward parliamentary speech (e.g., political vocabulary, formal register).
- As with Whisper models generally, occasional hallucinations may appear; consider temperature fallback / compression-ratio checks at inference time.
- Multilingual performance is not guaranteed (full-parameter finetuning emphasized Slovak).
📝 Citation & Paper
For more details, please see our paper on arXiv. If you use this model in your work, please cite it as:
@misc{božík2025slopalspeech2800hourslovakspeech,
title={SloPalSpeech: A 2,800-Hour Slovak Speech Corpus from Parliamentary Data},
author={Erik Božík and Marek Šuppa},
year={2025},
eprint={2509.19270},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2509.19270},
}
🙏 Acknowledgements
This work was supported by VÚB Banka who provided the GPU resources and backing necessary to accomplish it, enabling progress in Slovak ASR research.