File size: 3,121 Bytes
dc01308 83a8cbb 35715f6 83a8cbb 35715f6 83a8cbb ff4a161 83a8cbb 35715f6 83a8cbb ff4a161 83a8cbb dc01308 9a1a58b dc01308 83a8cbb 43ed399 dc01308 35715f6 dc01308 35715f6 dc01308 35715f6 dc01308 35715f6 dc01308 35715f6 dc01308 35715f6 dc01308 35715f6 dc01308 35715f6 dc01308 c9b5631 dc01308 35715f6 dc01308 ff4a161 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 | ---
base_model: openai/whisper-small
datasets:
- erikbozik/slovak-plenary-asr-corpus
language:
- sk
license: mit
metrics:
- wer
library_name: transformers
pipeline_tag: automatic-speech-recognition
tags:
- speech
- asr
- whisper
- slovak
- parliament
- legal
- politics
model-index:
- name: whisper-small-sk
results:
- task:
type: automatic-speech-recognition
name: Automatic Speech Recognition
dataset:
name: Common Voice 21 (Slovak test set)
type: common_voice
metrics:
- type: wer
value: 25.7
name: WER
- task:
type: automatic-speech-recognition
name: Automatic Speech Recognition
dataset:
name: FLEURS (Slovak test set)
type: fleurs
metrics:
- type: wer
value: 10.6
name: WER
---
# Whisper Small — Fine-tuned on Slovak Plenary ASR Corpus
This model is a fine-tuned version of [`openai/whisper-small`](https://huggingface.co/openai/whisper-small) presented in the paper [SloPal: A 60-Million-Word Slovak Parliamentary Corpus with Aligned Speech and Fine-Tuned ASR Models](https://huggingface.co/papers/2509.19270).
It is adapted for **Slovak ASR** using [SloPalSpeech](https://huggingface.co/datasets/erikbozik/slovak-plenary-asr-corpus): **2,806 hours** of aligned, ≤30 s speech–text pairs from official plenary sessions of the **Slovak National Council**.
- **Language:** Slovak
- **Domain:** Parliamentary / formal speech
- **Training data:** 2,806 h
- **Intended use:** Slovak speech recognition; strongest in formal/public-speaking contexts
## 🧪 Evaluation
| Dataset | Base WER | Fine-tuned WER | Δ (abs) |
|---|---:|---:|---:|
| Common Voice 21 (sk) | 58.4 | **25.7** | -32.7 |
| FLEURS (sk) | 36.1 | **10.6** | -25.5 |
*Numbers from the paper’s final benchmark runs.*
## 🔧 Training Details
- **Framework:** Hugging Face Transformers
- **Hardware:** NVIDIA A10 GPUs
- **Epochs:** up to 3 with early stopping on validation WER
- **Learning rate:** ~**40× smaller** than Whisper pretraining LR
## ⚠️ Limitations
- Domain bias toward parliamentary speech (e.g., political vocabulary, formal register).
- As with Whisper models generally, occasional hallucinations may appear; consider temperature fallback / compression-ratio checks at inference time.
- Multilingual performance is not guaranteed (full-parameter finetuning emphasized Slovak).
## 📝 Citation & Paper
For more details, please see our paper on [arXiv](https://arxiv.org/abs/2509.19270). If you use this model in your work, please cite it as:
```bibtex
@misc{božík2025slopalspeech2800hourslovakspeech,
title={SloPalSpeech: A 2,800-Hour Slovak Speech Corpus from Parliamentary Data},
author={Erik Božík and Marek Šuppa},
year={2025},
eprint={2509.19270},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2509.19270},
}
```
## 🙏 Acknowledgements
This work was supported by [**VÚB Banka**](https://www.vub.sk) who provided the GPU resources and backing necessary to accomplish it, enabling progress in Slovak ASR research. |