--- language: - sk tags: - speech - asr - whisper - slovak - parliament - legal - politics base_model: openai/whisper-large-v3 datasets: - erikbozik/slovak-plenary-asr-corpus metrics: - wer model-index: - name: whisper-large-v3-sk results: - task: type: automatic-speech-recognition name: Automatic Speech Recognition dataset: name: Common Voice 21 (Slovak test set) type: common_voice metrics: - name: WER type: wer value: 11.6 - task: type: automatic-speech-recognition name: Automatic Speech Recognition dataset: name: FLEURS (Slovak test set) type: fleurs metrics: - name: WER type: wer value: 5.5 license: mit --- # Whisper Large-v3 — Fine-tuned on Slovak Plenary ASR Corpus This model is a fine-tuned version of [`openai/whisper-large-v3`](https://huggingface.co/openai/whisper-large-v3). It is adapted for **Slovak ASR** using [SloPalSpeech](https://huggingface.co/datasets/erikbozik/slovak-plenary-asr-corpus): **2,806 hours** of aligned, ≤30 s speech–text pairs from official plenary sessions of the **Slovak National Council**. - **Language:** Slovak - **Domain:** Parliamentary / formal speech - **Training data:** 2,806 h - **Intended use:** Slovak speech recognition; strongest in formal/public-speaking contexts ## 🧪 Evaluation | Dataset | Base WER | Fine-tuned WER | Δ (abs) | |---|---:|---:|---:| | Common Voice 21 (sk) | 20.8 | **11.6** | -9.2 | | FLEURS (sk) | 9.2 | **5.5** | -3.7 | *Numbers from the paper’s final benchmark runs.* ## 🔧 Training Details - **Framework:** Hugging Face Transformers - **Hardware:** Multi-GPU setup (NVIDIA A10s) with Fully Sharded Data Parallel (FSDP) - **Epochs:** ~2 with early stopping on validation WER - **Learning rate:** `1e-5` with weight decay `0.01` to prevent overfitting - **Notes:** Training required sharded checkpoints; evaluation run separately due to runtime compatibility issues ## ⚠️ Limitations - Domain bias toward parliamentary speech (e.g., political vocabulary, formal register). - As with Whisper models generally, occasional hallucinations may appear; consider temperature fallback / compression-ratio checks at inference time. - Multilingual performance is not guaranteed (full-parameter finetuning emphasized Slovak). ## 📝 Citation & Paper For more details, please see our paper on [arXiv](https://arxiv.org/abs/2509.19270). If you use this model in your work, please cite it as: ```bibtex @misc{božík2025slopalspeech2800hourslovakspeech, title={SloPalSpeech: A 2,800-Hour Slovak Speech Corpus from Parliamentary Data}, author={Erik Božík and Marek Šuppa}, year={2025}, eprint={2509.19270}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2509.19270}, } ``` ## 🙏 Acknowledgements This work was supported by [**VÚB Banka**](https://www.vub.sk) who provided the GPU resources and backing necessary to accomplish it, enabling progress in Slovak ASR research.