File size: 2,883 Bytes
dc01308
35715f6
 
 
 
 
 
 
 
 
 
 
 
9a1a58b
35715f6
 
 
 
 
 
 
 
 
 
 
 
ff4a161
 
 
35715f6
 
 
 
 
 
 
ff4a161
 
 
 
dc01308
 
9a1a58b
dc01308
35715f6
43ed399
dc01308
35715f6
 
 
 
dc01308
35715f6
dc01308
35715f6
 
 
 
dc01308
35715f6
dc01308
35715f6
dc01308
35715f6
 
 
 
dc01308
35715f6
dc01308
35715f6
 
 
dc01308
 
c9b5631
 
 
 
 
 
 
 
 
 
 
 
 
dc01308
35715f6
dc01308
ff4a161
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
---
language:
- sk
tags:
- speech
- asr
- whisper
- slovak
- parliament
- legal
- politics
base_model: openai/whisper-small
datasets:
- erikbozik/slovak-plenary-asr-corpus
metrics:
- wer
model-index:
- name: whisper-small-sk
  results:
  - task:
      type: automatic-speech-recognition
      name: Automatic Speech Recognition
    dataset:
      name: Common Voice 21 (Slovak test set)
      type: common_voice
    metrics:
    - name: WER
      type: wer
      value: 25.7
  - task:
      type: automatic-speech-recognition
      name: Automatic Speech Recognition
    dataset:
      name: FLEURS (Slovak test set)
      type: fleurs
    metrics:
    - name: WER
      type: wer
      value: 10.6
license: mit
---

# Whisper Small — Fine-tuned on Slovak Plenary ASR Corpus

This model is a fine-tuned version of [`openai/whisper-small`](https://huggingface.co/openai/whisper-small).  
It is adapted for **Slovak ASR** using [SloPalSpeech](https://huggingface.co/datasets/erikbozik/slovak-plenary-asr-corpus): **2,806 hours** of aligned, ≤30 s speech–text pairs from official plenary sessions of the **Slovak National Council**.

- **Language:** Slovak  
- **Domain:** Parliamentary / formal speech  
- **Training data:** 2,806 h
- **Intended use:** Slovak speech recognition; strongest in formal/public-speaking contexts

## 🧪 Evaluation

| Dataset | Base WER | Fine-tuned WER | Δ (abs) |
|---|---:|---:|---:|
| Common Voice 21 (sk) | 58.4 | **25.7** | -32.7 |
| FLEURS (sk) | 36.1 | **10.6** | -25.5 |

*Numbers from the paper’s final benchmark runs.*

## 🔧 Training Details

- **Framework:** Hugging Face Transformers  
- **Hardware:** NVIDIA A10 GPUs  
- **Epochs:** up to 3 with early stopping on validation WER  
- **Learning rate:** ~**40× smaller** than Whisper pretraining LR 

## ⚠️ Limitations

- Domain bias toward parliamentary speech (e.g., political vocabulary, formal register).  
- As with Whisper models generally, occasional hallucinations may appear; consider temperature fallback / compression-ratio checks at inference time.  
- Multilingual performance is not guaranteed (full-parameter finetuning emphasized Slovak).


## 📝 Citation & Paper
For more details, please see our paper on [arXiv](https://arxiv.org/abs/2509.19270). If you use this model in your work, please cite it as:
```bibtex
@misc{božík2025slopalspeech2800hourslovakspeech,
      title={SloPalSpeech: A 2,800-Hour Slovak Speech Corpus from Parliamentary Data}, 
      author={Erik Božík and Marek Šuppa},
      year={2025},
      eprint={2509.19270},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2509.19270}, 
}
```

## 🙏 Acknowledgements

This work was supported by [**VÚB Banka**](https://www.vub.sk) who provided the GPU resources and backing necessary to accomplish it, enabling progress in Slovak ASR research.