--- language: - sk tags: - speech - asr - whisper - slovak - parliament - legal - politics base_model: openai/whisper-small datasets: - erikbozik/slovak-plenary-asr-corpus metrics: - wer model-index: - name: whisper-small-sk results: - task: type: automatic-speech-recognition name: Automatic Speech Recognition dataset: name: Common Voice 21 (Slovak test set) type: common_voice metrics: - name: WER type: wer value: 25.7 - task: type: automatic-speech-recognition name: Automatic Speech Recognition dataset: name: FLEURS (Slovak test set) type: fleurs metrics: - name: WER type: wer value: 10.6 license: mit --- # Whisper Small — Fine-tuned on Slovak Plenary ASR Corpus This model is a fine-tuned version of [`openai/whisper-small`](https://huggingface.co/openai/whisper-small). It is adapted for **Slovak ASR** using [SloPalSpeech](https://huggingface.co/datasets/erikbozik/slovak-plenary-asr-corpus): **2,806 hours** of aligned, ≤30 s speech–text pairs from official plenary sessions of the **Slovak National Council**. - **Language:** Slovak - **Domain:** Parliamentary / formal speech - **Training data:** 2,806 h - **Intended use:** Slovak speech recognition; strongest in formal/public-speaking contexts ## 🧪 Evaluation | Dataset | Base WER | Fine-tuned WER | Δ (abs) | |---|---:|---:|---:| | Common Voice 21 (sk) | 58.4 | **25.7** | -32.7 | | FLEURS (sk) | 36.1 | **10.6** | -25.5 | *Numbers from the paper’s final benchmark runs.* ## 🔧 Training Details - **Framework:** Hugging Face Transformers - **Hardware:** NVIDIA A10 GPUs - **Epochs:** up to 3 with early stopping on validation WER - **Learning rate:** ~**40× smaller** than Whisper pretraining LR ## ⚠️ Limitations - Domain bias toward parliamentary speech (e.g., political vocabulary, formal register). - As with Whisper models generally, occasional hallucinations may appear; consider temperature fallback / compression-ratio checks at inference time. - Multilingual performance is not guaranteed (full-parameter finetuning emphasized Slovak). ## 📝 Citation & Paper For more details, please see our paper on [arXiv](https://arxiv.org/abs/2509.19270). If you use this model in your work, please cite it as: ```bibtex @misc{božík2025slopalspeech2800hourslovakspeech, title={SloPalSpeech: A 2,800-Hour Slovak Speech Corpus from Parliamentary Data}, author={Erik Božík and Marek Šuppa}, year={2025}, eprint={2509.19270}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2509.19270}, } ``` ## 🙏 Acknowledgements This work was supported by [**VÚB Banka**](https://www.vub.sk) who provided the GPU resources and backing necessary to accomplish it, enabling progress in Slovak ASR research.