NaiveNeuron
/

whisper-large-v3-sk

Eval Results (legacy)

Model card Files Files and versions

whisper-large-v3-sk / README.md

erikbozik's picture

Update README.md

377984b verified 6 months ago

|

history blame contribute delete

3.07 kB

	---
	language:
	- sk
	tags:
	- speech
	- asr
	- whisper
	- slovak
	- parliament
	- legal
	- politics
	base_model: openai/whisper-large-v3
	datasets:
	- erikbozik/slovak-plenary-asr-corpus
	metrics:
	- wer
	model-index:
	- name: whisper-large-v3-sk
	results:
	- task:
	type: automatic-speech-recognition
	name: Automatic Speech Recognition
	dataset:
	name: Common Voice 21 (Slovak test set)
	type: common_voice
	metrics:
	- name: WER
	type: wer
	value: 11.6
	- task:
	type: automatic-speech-recognition
	name: Automatic Speech Recognition
	dataset:
	name: FLEURS (Slovak test set)
	type: fleurs
	metrics:
	- name: WER
	type: wer
	value: 5.5
	license: mit
	---

	# Whisper Large-v3 — Fine-tuned on Slovak Plenary ASR Corpus

	This model is a fine-tuned version of [`openai/whisper-large-v3`](https://huggingface.co/openai/whisper-large-v3).
	It is adapted for Slovak ASR using [SloPalSpeech](https://huggingface.co/datasets/erikbozik/slovak-plenary-asr-corpus): 2,806 hours of aligned, ≤30 s speech–text pairs from official plenary sessions of the Slovak National Council.

	- Language: Slovak
	- Domain: Parliamentary / formal speech
	- Training data: 2,806 h
	- Intended use: Slovak speech recognition; strongest in formal/public-speaking contexts

	## 🧪 Evaluation

	\| Dataset \| Base WER \| Fine-tuned WER \| Δ (abs) \|
	\|---\|---:\|---:\|---:\|
	\| Common Voice 21 (sk) \| 20.8 \| 11.6 \| -9.2 \|
	\| FLEURS (sk) \| 9.2 \| 5.5 \| -3.7 \|

	Numbers from the paper’s final benchmark runs.

	## 🔧 Training Details

	- Framework: Hugging Face Transformers
	- Hardware: Multi-GPU setup (NVIDIA A10s) with Fully Sharded Data Parallel (FSDP)
	- Epochs: ~2 with early stopping on validation WER
	- Learning rate: `1e-5` with weight decay `0.01` to prevent overfitting
	- Notes: Training required sharded checkpoints; evaluation run separately due to runtime compatibility issues

	## ⚠️ Limitations

	- Domain bias toward parliamentary speech (e.g., political vocabulary, formal register).
	- As with Whisper models generally, occasional hallucinations may appear; consider temperature fallback / compression-ratio checks at inference time.
	- Multilingual performance is not guaranteed (full-parameter finetuning emphasized Slovak).

	## 📝 Citation & Paper
	For more details, please see our paper on [arXiv](https://arxiv.org/abs/2509.19270). If you use this model in your work, please cite it as:
	```bibtex
	@misc{božík2025slopalspeech2800hourslovakspeech,
	title={SloPalSpeech: A 2,800-Hour Slovak Speech Corpus from Parliamentary Data},
	author={Erik Božík and Marek Šuppa},
	year={2025},
	eprint={2509.19270},
	archivePrefix={arXiv},
	primaryClass={cs.CL},
	url={https://arxiv.org/abs/2509.19270},
	}
	```

	## 🙏 Acknowledgements

	This work was supported by [VÚB Banka](https://www.vub.sk) who provided the GPU resources and backing necessary to accomplish it, enabling progress in Slovak ASR research.