erikbozik commited on
Commit
377984b
·
verified ·
1 Parent(s): 8375a67

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -4
README.md CHANGED
@@ -43,7 +43,7 @@ license: mit
43
  # Whisper Large-v3 — Fine-tuned on Slovak Plenary ASR Corpus
44
 
45
  This model is a fine-tuned version of [`openai/whisper-large-v3`](https://huggingface.co/openai/whisper-large-v3).
46
- It is adapted for **Slovak ASR** using the [Slovak Plenary ASR Corpus](https://huggingface.co/datasets/erikbozik/slovak-plenary-asr-corpus): **2,806 hours** of aligned, ≤30 s speech–text pairs from official plenary sessions of the **Slovak National Council**.
47
 
48
  - **Language:** Slovak
49
  - **Domain:** Parliamentary / formal speech
@@ -73,9 +73,19 @@ It is adapted for **Slovak ASR** using the [Slovak Plenary ASR Corpus](https://h
73
  - As with Whisper models generally, occasional hallucinations may appear; consider temperature fallback / compression-ratio checks at inference time.
74
  - Multilingual performance is not guaranteed (full-parameter finetuning emphasized Slovak).
75
 
76
- ## 📄 Paper & Citation
77
-
78
- Coming soon
 
 
 
 
 
 
 
 
 
 
79
 
80
  ## 🙏 Acknowledgements
81
 
 
43
  # Whisper Large-v3 — Fine-tuned on Slovak Plenary ASR Corpus
44
 
45
  This model is a fine-tuned version of [`openai/whisper-large-v3`](https://huggingface.co/openai/whisper-large-v3).
46
+ It is adapted for **Slovak ASR** using [SloPalSpeech](https://huggingface.co/datasets/erikbozik/slovak-plenary-asr-corpus): **2,806 hours** of aligned, ≤30 s speech–text pairs from official plenary sessions of the **Slovak National Council**.
47
 
48
  - **Language:** Slovak
49
  - **Domain:** Parliamentary / formal speech
 
73
  - As with Whisper models generally, occasional hallucinations may appear; consider temperature fallback / compression-ratio checks at inference time.
74
  - Multilingual performance is not guaranteed (full-parameter finetuning emphasized Slovak).
75
 
76
+ ## 📝 Citation & Paper
77
+ For more details, please see our paper on [arXiv](https://arxiv.org/abs/2509.19270). If you use this model in your work, please cite it as:
78
+ ```bibtex
79
+ @misc{božík2025slopalspeech2800hourslovakspeech,
80
+ title={SloPalSpeech: A 2,800-Hour Slovak Speech Corpus from Parliamentary Data},
81
+ author={Erik Božík and Marek Šuppa},
82
+ year={2025},
83
+ eprint={2509.19270},
84
+ archivePrefix={arXiv},
85
+ primaryClass={cs.CL},
86
+ url={https://arxiv.org/abs/2509.19270},
87
+ }
88
+ ```
89
 
90
  ## 🙏 Acknowledgements
91