whisper-medium-sk / README.md
nielsr's picture
nielsr HF Staff
Add pipeline_tag, library_name and link to paper
fc363b6 verified
|
raw
history blame
3.18 kB
---
base_model: openai/whisper-medium
datasets:
- erikbozik/slovak-plenary-asr-corpus
language:
- sk
license: mit
metrics:
- wer
library_name: transformers
pipeline_tag: automatic-speech-recognition
tags:
- speech
- asr
- whisper
- slovak
- parliament
- legal
- politics
model-index:
- name: whisper-medium-sk
results:
- task:
type: automatic-speech-recognition
name: Automatic Speech Recognition
dataset:
name: Common Voice 21 (Slovak test set)
type: common_voice
metrics:
- type: wer
value: 18
name: WER
- task:
type: automatic-speech-recognition
name: Automatic Speech Recognition
dataset:
name: FLEURS (Slovak test set)
type: fleurs
metrics:
- type: wer
value: 7.6
name: WER
---
# Whisper Medium — Fine-tuned on SloPalSpeech
This model is a fine-tuned version of [`openai/whisper-medium`](https://huggingface.co/openai/whisper-medium), presented in the paper [SloPal: A 60-Million-Word Slovak Parliamentary Corpus with Aligned Speech and Fine-Tuned ASR Models](https://huggingface.co/papers/2509.19270).
It is adapted for **Slovak ASR** using [SloPalSpeech](https://huggingface.co/datasets/erikbozik/slovak-plenary-asr-corpus): **2,806 hours** of aligned, ≤30 s speech–text pairs from official plenary sessions of the **Slovak National Council**.
- **Language:** Slovak
- **Domain:** Parliamentary / formal speech
- **Training data:** 2,806 h
- **Intended use:** Slovak speech recognition; strongest in formal/public-speaking contexts
## 🧪 Evaluation
| Dataset | Base WER | Fine-tuned WER | Δ (abs) |
|---|---:|---:|---:|
| Common Voice 21 (sk) | 38.0 | **18.0** | -20.0 |
| FLEURS (sk) | 18.7 | **7.6** | -11.1 |
*Numbers from the paper’s final benchmark runs.*
## 🔧 Training Details
- **Framework:** Hugging Face Transformers
- **Hardware:** NVIDIA A10 GPUs
- **Epochs:** up to 3 with early stopping on validation WER
- **Learning rate:** ~**40× smaller** than Whisper pretraining LR
## ⚠️ Limitations
- Domain bias toward parliamentary speech (e.g., political vocabulary, formal register).
- As with Whisper models generally, occasional hallucinations may appear; consider temperature fallback / compression-ratio checks at inference time.
- Multilingual performance is not guaranteed (full-parameter finetuning emphasized Slovak).
## 📝 Citation & Paper
For more details, please see our paper on [arXiv](https://arxiv.org/abs/2509.19270) or the [Hugging Face paper page](https://huggingface.co/papers/2509.19270). If you use this model in your work, please cite it as:
```bibtex
@misc{božík2025slopalspeech2800hourslovakspeech,
title={SloPalSpeech: A 2,800-Hour Slovak Speech Corpus from Parliamentary Data},
author={Erik Božík and Marek Šuppa},
year={2025},
eprint={2509.19270},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2509.19270},
}
```
## 🙏 Acknowledgements
This work was supported by [**VÚB Banka**](https://www.vub.sk) who provided the GPU resources and backing necessary to accomplish it, enabling progress in Slovak ASR research.