TeodoraR
/

Ro_VITS

Text-to-Speech

Romanian

Model card Files Files and versions

xet

Community

TeodoraR commited on Jan 30

Commit

de9977d

verified ·

1 Parent(s): 79df7d1

create readme file

Browse files

Files changed (1) hide show

README.md +62 -0

README.md ADDED Viewed

	@@ -0,0 +1,62 @@

+---
+license: apache-2.0
+language:
+- ro
+pipeline_tag: text-to-speech
+---
+# Romanian TTS Model (Finetuned)
+This is a **VITS** model finetuned for the Romanian language. It was trained **(from scratch)** on the SWARA dataset and finetuned on specific speaker samples (BEA/SGS).
+## Model Details
+- **Architecture:** VITS
+- **Language:** Romanian (ro)
+- **Base Dataset:** The SWARA Speech Corpus (18k samples)
+- **Base Model:** trained on 16 speakers (includes both male & female voices, balanced data). The base model components can be found in the 'swara' directory.
+- **Finetuning:** finetuned on 2 speakers (bas and sgs). Their checkpoints can be found in the 'bas' and 'sgs' directories.
+- **Sample rate:** 22050Hz
+## Usage instructions
+- **Included in the official repository of VITS:** https://github.com/jaywalnut310/vits.git
+- **Our repository on finetuning various TTS models for the Romanian language:** https://gitlab.com/opentts_ragman/OpenTTS
+## Citation
+If you use this model, please cite the original VITS paper and the SWARA dataset:
+```bibtex
+@article{kim2021vits,
+  title={{Vits: Variational inference with adversarial learning for end-to-end text-to-speech}},
+  author={Kim, Jaehyeon and Kong, Jae Sung and Yoon, Byoungkun and Kim, Sungjoo and Choi, Daehyun},
+  journal={Advances in Neural Information Processing Systems},
+  volume={34},
+  pages={6879--6895},
+  year={2021}
+}
+@inproceedings{stan_sped2017,
+  author = {Stan, Adriana and Dinescu, Florina and Tiple, Cristina and Meza, Serban and Orza, Bogdan and Chirila, Magdalena and Giurgiu, Mircea},
+  title = {{The SWARA Speech Corpus: A Large Parallel Romanian Read Speech Dataset}},
+  year = 2017,
+  address = {Bucharest, Romania},
+  booktitle = {{Proceedings of the 9th Conference on Speech Technology and Human-Computer Dialogue (SpeD)}},
+  month = {July, 6-9},
+}
+```
+If you use this specific finetuned checkpoint in your work, please cite it as follows:
+```bibtex
+@ARTICLE{11269795,
+  author={Răgman, Teodora and Bogdan Stânea, Adrian and Cucu, Horia and Stan, Adriana},
+  journal={IEEE Access},
+  title={How Open Is Open TTS? A Practical Evaluation of Open Source TTS Tools},
+  year={2025},
+  volume={13},
+  number={},
+  pages={203415-203428},
+  keywords={Computer architecture;Training;Text to speech;Spectrogram;Decoding;Computational modeling;Codecs;Predictive models;Acoustics;Low latency communication;Speech synthesis;open tools;evaluation;computational requirements;TTS adaptation;text-to-speech;objective measures;listening test;Romanian},
+  doi={10.1109/ACCESS.2025.3637322}}
+```