NaiveNeuron
/

whisper-small-sk

Eval Results (legacy)

Model card Files Files and versions

whisper-small-sk / README.md

erikbozik's picture

Update README.md

43ed399 verified 6 months ago

|

history blame contribute delete

2.88 kB

	---
	language:
	- sk
	tags:
	- speech
	- asr
	- whisper
	- slovak
	- parliament
	- legal
	- politics
	base_model: openai/whisper-small
	datasets:
	- erikbozik/slovak-plenary-asr-corpus
	metrics:
	- wer
	model-index:
	- name: whisper-small-sk
	results:
	- task:
	type: automatic-speech-recognition
	name: Automatic Speech Recognition
	dataset:
	name: Common Voice 21 (Slovak test set)
	type: common_voice
	metrics:
	- name: WER
	type: wer
	value: 25.7
	- task:
	type: automatic-speech-recognition
	name: Automatic Speech Recognition
	dataset:
	name: FLEURS (Slovak test set)
	type: fleurs
	metrics:
	- name: WER
	type: wer
	value: 10.6
	license: mit
	---

	# Whisper Small — Fine-tuned on Slovak Plenary ASR Corpus

	This model is a fine-tuned version of [`openai/whisper-small`](https://huggingface.co/openai/whisper-small).
	It is adapted for Slovak ASR using [SloPalSpeech](https://huggingface.co/datasets/erikbozik/slovak-plenary-asr-corpus): 2,806 hours of aligned, ≤30 s speech–text pairs from official plenary sessions of the Slovak National Council.

	- Language: Slovak
	- Domain: Parliamentary / formal speech
	- Training data: 2,806 h
	- Intended use: Slovak speech recognition; strongest in formal/public-speaking contexts

	## 🧪 Evaluation

	\| Dataset \| Base WER \| Fine-tuned WER \| Δ (abs) \|
	\|---\|---:\|---:\|---:\|
	\| Common Voice 21 (sk) \| 58.4 \| 25.7 \| -32.7 \|
	\| FLEURS (sk) \| 36.1 \| 10.6 \| -25.5 \|

	Numbers from the paper’s final benchmark runs.

	## 🔧 Training Details

	- Framework: Hugging Face Transformers
	- Hardware: NVIDIA A10 GPUs
	- Epochs: up to 3 with early stopping on validation WER
	- Learning rate: ~40× smaller than Whisper pretraining LR

	## ⚠️ Limitations

	- Domain bias toward parliamentary speech (e.g., political vocabulary, formal register).
	- As with Whisper models generally, occasional hallucinations may appear; consider temperature fallback / compression-ratio checks at inference time.
	- Multilingual performance is not guaranteed (full-parameter finetuning emphasized Slovak).


	## 📝 Citation & Paper
	For more details, please see our paper on [arXiv](https://arxiv.org/abs/2509.19270). If you use this model in your work, please cite it as:
	```bibtex
	@misc{božík2025slopalspeech2800hourslovakspeech,
	title={SloPalSpeech: A 2,800-Hour Slovak Speech Corpus from Parliamentary Data},
	author={Erik Božík and Marek Šuppa},
	year={2025},
	eprint={2509.19270},
	archivePrefix={arXiv},
	primaryClass={cs.CL},
	url={https://arxiv.org/abs/2509.19270},
	}
	```

	## 🙏 Acknowledgements

	This work was supported by [VÚB Banka](https://www.vub.sk) who provided the GPU resources and backing necessary to accomplish it, enabling progress in Slovak ASR research.