| --- |
| language: |
| - sk |
| tags: |
| - speech |
| - asr |
| - whisper |
| - slovak |
| - parliament |
| - legal |
| - politics |
| base_model: openai/whisper-large-v3 |
| datasets: |
| - erikbozik/slovak-plenary-asr-corpus |
| metrics: |
| - wer |
| model-index: |
| - name: whisper-large-v3-sk |
| results: |
| - task: |
| type: automatic-speech-recognition |
| name: Automatic Speech Recognition |
| dataset: |
| name: Common Voice 21 (Slovak test set) |
| type: common_voice |
| metrics: |
| - name: WER |
| type: wer |
| value: 11.6 |
| - task: |
| type: automatic-speech-recognition |
| name: Automatic Speech Recognition |
| dataset: |
| name: FLEURS (Slovak test set) |
| type: fleurs |
| metrics: |
| - name: WER |
| type: wer |
| value: 5.5 |
| license: mit |
| --- |
| |
| # Whisper Large-v3 — Fine-tuned on Slovak Plenary ASR Corpus |
|
|
| This model is a fine-tuned version of [`openai/whisper-large-v3`](https://huggingface.co/openai/whisper-large-v3). |
| It is adapted for **Slovak ASR** using [SloPalSpeech](https://huggingface.co/datasets/erikbozik/slovak-plenary-asr-corpus): **2,806 hours** of aligned, ≤30 s speech–text pairs from official plenary sessions of the **Slovak National Council**. |
|
|
| - **Language:** Slovak |
| - **Domain:** Parliamentary / formal speech |
| - **Training data:** 2,806 h |
| - **Intended use:** Slovak speech recognition; strongest in formal/public-speaking contexts |
|
|
| ## 🧪 Evaluation |
|
|
| | Dataset | Base WER | Fine-tuned WER | Δ (abs) | |
| |---|---:|---:|---:| |
| | Common Voice 21 (sk) | 20.8 | **11.6** | -9.2 | |
| | FLEURS (sk) | 9.2 | **5.5** | -3.7 | |
|
|
| *Numbers from the paper’s final benchmark runs.* |
|
|
| ## 🔧 Training Details |
|
|
| - **Framework:** Hugging Face Transformers |
| - **Hardware:** Multi-GPU setup (NVIDIA A10s) with Fully Sharded Data Parallel (FSDP) |
| - **Epochs:** ~2 with early stopping on validation WER |
| - **Learning rate:** `1e-5` with weight decay `0.01` to prevent overfitting |
| - **Notes:** Training required sharded checkpoints; evaluation run separately due to runtime compatibility issues |
|
|
| ## ⚠️ Limitations |
|
|
| - Domain bias toward parliamentary speech (e.g., political vocabulary, formal register). |
| - As with Whisper models generally, occasional hallucinations may appear; consider temperature fallback / compression-ratio checks at inference time. |
| - Multilingual performance is not guaranteed (full-parameter finetuning emphasized Slovak). |
|
|
| ## 📝 Citation & Paper |
| For more details, please see our paper on [arXiv](https://arxiv.org/abs/2509.19270). If you use this model in your work, please cite it as: |
| ```bibtex |
| @misc{božík2025slopalspeech2800hourslovakspeech, |
| title={SloPalSpeech: A 2,800-Hour Slovak Speech Corpus from Parliamentary Data}, |
| author={Erik Božík and Marek Šuppa}, |
| year={2025}, |
| eprint={2509.19270}, |
| archivePrefix={arXiv}, |
| primaryClass={cs.CL}, |
| url={https://arxiv.org/abs/2509.19270}, |
| } |
| ``` |
|
|
| ## 🙏 Acknowledgements |
|
|
| This work was supported by [**VÚB Banka**](https://www.vub.sk) who provided the GPU resources and backing necessary to accomplish it, enabling progress in Slovak ASR research. |