Update README.md
Browse files
README.md
CHANGED
|
@@ -43,7 +43,7 @@ license: mit
|
|
| 43 |
# Whisper Large-v3 — Fine-tuned on Slovak Plenary ASR Corpus
|
| 44 |
|
| 45 |
This model is a fine-tuned version of [`openai/whisper-large-v3`](https://huggingface.co/openai/whisper-large-v3).
|
| 46 |
-
It is adapted for **Slovak ASR** using
|
| 47 |
|
| 48 |
- **Language:** Slovak
|
| 49 |
- **Domain:** Parliamentary / formal speech
|
|
@@ -73,9 +73,19 @@ It is adapted for **Slovak ASR** using the [Slovak Plenary ASR Corpus](https://h
|
|
| 73 |
- As with Whisper models generally, occasional hallucinations may appear; consider temperature fallback / compression-ratio checks at inference time.
|
| 74 |
- Multilingual performance is not guaranteed (full-parameter finetuning emphasized Slovak).
|
| 75 |
|
| 76 |
-
##
|
| 77 |
-
|
| 78 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 79 |
|
| 80 |
## 🙏 Acknowledgements
|
| 81 |
|
|
|
|
| 43 |
# Whisper Large-v3 — Fine-tuned on Slovak Plenary ASR Corpus
|
| 44 |
|
| 45 |
This model is a fine-tuned version of [`openai/whisper-large-v3`](https://huggingface.co/openai/whisper-large-v3).
|
| 46 |
+
It is adapted for **Slovak ASR** using [SloPalSpeech](https://huggingface.co/datasets/erikbozik/slovak-plenary-asr-corpus): **2,806 hours** of aligned, ≤30 s speech–text pairs from official plenary sessions of the **Slovak National Council**.
|
| 47 |
|
| 48 |
- **Language:** Slovak
|
| 49 |
- **Domain:** Parliamentary / formal speech
|
|
|
|
| 73 |
- As with Whisper models generally, occasional hallucinations may appear; consider temperature fallback / compression-ratio checks at inference time.
|
| 74 |
- Multilingual performance is not guaranteed (full-parameter finetuning emphasized Slovak).
|
| 75 |
|
| 76 |
+
## 📝 Citation & Paper
|
| 77 |
+
For more details, please see our paper on [arXiv](https://arxiv.org/abs/2509.19270). If you use this model in your work, please cite it as:
|
| 78 |
+
```bibtex
|
| 79 |
+
@misc{božík2025slopalspeech2800hourslovakspeech,
|
| 80 |
+
title={SloPalSpeech: A 2,800-Hour Slovak Speech Corpus from Parliamentary Data},
|
| 81 |
+
author={Erik Božík and Marek Šuppa},
|
| 82 |
+
year={2025},
|
| 83 |
+
eprint={2509.19270},
|
| 84 |
+
archivePrefix={arXiv},
|
| 85 |
+
primaryClass={cs.CL},
|
| 86 |
+
url={https://arxiv.org/abs/2509.19270},
|
| 87 |
+
}
|
| 88 |
+
```
|
| 89 |
|
| 90 |
## 🙏 Acknowledgements
|
| 91 |
|