| --- |
| language: |
| - sk |
| tags: |
| - speech |
| - asr |
| - whisper |
| - slovak |
| - parliament |
| - legal |
| - politics |
| base_model: openai/whisper-small |
| datasets: |
| - erikbozik/slovak-plenary-asr-corpus |
| metrics: |
| - wer |
| model-index: |
| - name: whisper-small-sk |
| results: |
| - task: |
| type: automatic-speech-recognition |
| name: Automatic Speech Recognition |
| dataset: |
| name: Common Voice 21 (Slovak test set) |
| type: common_voice |
| metrics: |
| - name: WER |
| type: wer |
| value: 25.7 |
| - task: |
| type: automatic-speech-recognition |
| name: Automatic Speech Recognition |
| dataset: |
| name: FLEURS (Slovak test set) |
| type: fleurs |
| metrics: |
| - name: WER |
| type: wer |
| value: 10.6 |
| license: mit |
| --- |
| |
| # Whisper Small — Fine-tuned on Slovak Plenary ASR Corpus |
|
|
| This model is a fine-tuned version of [`openai/whisper-small`](https://huggingface.co/openai/whisper-small). |
| It is adapted for **Slovak ASR** using [SloPalSpeech](https://huggingface.co/datasets/erikbozik/slovak-plenary-asr-corpus): **2,806 hours** of aligned, ≤30 s speech–text pairs from official plenary sessions of the **Slovak National Council**. |
|
|
| - **Language:** Slovak |
| - **Domain:** Parliamentary / formal speech |
| - **Training data:** 2,806 h |
| - **Intended use:** Slovak speech recognition; strongest in formal/public-speaking contexts |
|
|
| ## 🧪 Evaluation |
|
|
| | Dataset | Base WER | Fine-tuned WER | Δ (abs) | |
| |---|---:|---:|---:| |
| | Common Voice 21 (sk) | 58.4 | **25.7** | -32.7 | |
| | FLEURS (sk) | 36.1 | **10.6** | -25.5 | |
|
|
| *Numbers from the paper’s final benchmark runs.* |
|
|
| ## 🔧 Training Details |
|
|
| - **Framework:** Hugging Face Transformers |
| - **Hardware:** NVIDIA A10 GPUs |
| - **Epochs:** up to 3 with early stopping on validation WER |
| - **Learning rate:** ~**40× smaller** than Whisper pretraining LR |
|
|
| ## ⚠️ Limitations |
|
|
| - Domain bias toward parliamentary speech (e.g., political vocabulary, formal register). |
| - As with Whisper models generally, occasional hallucinations may appear; consider temperature fallback / compression-ratio checks at inference time. |
| - Multilingual performance is not guaranteed (full-parameter finetuning emphasized Slovak). |
|
|
|
|
| ## 📝 Citation & Paper |
| For more details, please see our paper on [arXiv](https://arxiv.org/abs/2509.19270). If you use this model in your work, please cite it as: |
| ```bibtex |
| @misc{božík2025slopalspeech2800hourslovakspeech, |
| title={SloPalSpeech: A 2,800-Hour Slovak Speech Corpus from Parliamentary Data}, |
| author={Erik Božík and Marek Šuppa}, |
| year={2025}, |
| eprint={2509.19270}, |
| archivePrefix={arXiv}, |
| primaryClass={cs.CL}, |
| url={https://arxiv.org/abs/2509.19270}, |
| } |
| ``` |
|
|
| ## 🙏 Acknowledgements |
|
|
| This work was supported by [**VÚB Banka**](https://www.vub.sk) who provided the GPU resources and backing necessary to accomplish it, enabling progress in Slovak ASR research. |