| | --- |
| | license: apache-2.0 |
| | language: |
| | - ro |
| | pipeline_tag: text-to-speech |
| | --- |
| | |
| | # Romanian TTS Model (Finetuned) |
| |
|
| | This is a **FastPitch** model finetuned for the Romanian language. It was trained **(from scratch)** on the SWARA dataset and finetuned on specific speaker samples (BEA/SGS). |
| |
|
| | ## Model Details |
| | - **Architecture:** FastPitch |
| | - **Language:** Romanian (ro) |
| | - **Base Dataset:** The SWARA Speech Corpus (18k samples) |
| | - **Base Model:** trained on 16 speakers (includes both male & female voices, balanced data). The base model components can be found in the 'swara' directory. |
| | - **Finetuning:** finetuned on 2 speakers (bas and sgs). Their checkpoints can be found in the 'bas' and 'sgs' directories. |
| | - **Sample rate:** 22050Hz |
| |
|
| | ## Usage instructions |
| | - **Included in the official repository of VITS:** https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/SpeechSynthesis/FastPitch |
| | - **Our repository on finetuning various TTS models for the Romanian language:** https://gitlab.com/opentts_ragman/OpenTTS |
| | |
| | ## Citation |
| | |
| | If you use this model, please cite the original FastPitch paper and the SWARA dataset: |
| | |
| | ```bibtex |
| | @INPROCEEDINGS{fastpitch, |
| | author={Łańcucki, Adrian}, |
| | booktitle={Proc. of ICASSP}, |
| | title={{Fastpitch: Parallel Text-to-Speech with Pitch Prediction}}, |
| | year={2021}, |
| | volume={}, |
| | number={}, |
| | pages={6588-6592}, |
| | keywords={Frequency synthesizers;Frequency modulation;Conferences;Semantics;Predictive models;Real-time systems;Acoustics;text-to-speech;speech synthesis;fundamental frequency}, |
| | doi={10.1109/ICASSP39728.2021.9413889}} |
| | |
| | @inproceedings{stan_sped2017, |
| | author = {Stan, Adriana and Dinescu, Florina and Tiple, Cristina and Meza, Serban and Orza, Bogdan and Chirila, Magdalena and Giurgiu, Mircea}, |
| | title = {{The SWARA Speech Corpus: A Large Parallel Romanian Read Speech Dataset}}, |
| | year = 2017, |
| | address = {Bucharest, Romania}, |
| | booktitle = {{Proceedings of the 9th Conference on Speech Technology and Human-Computer Dialogue (SpeD)}}, |
| | month = {July, 6-9}, |
| | } |
| | ``` |
| | |
| | If you use this specific finetuned checkpoint in your work, please cite it as follows: |
| | |
| | ```bibtex |
| | @ARTICLE{11269795, |
| | author={Răgman, Teodora and Bogdan Stânea, Adrian and Cucu, Horia and Stan, Adriana}, |
| | journal={IEEE Access}, |
| | title={How Open Is Open TTS? A Practical Evaluation of Open Source TTS Tools}, |
| | year={2025}, |
| | volume={13}, |
| | number={}, |
| | pages={203415-203428}, |
| | keywords={Computer architecture;Training;Text to speech;Spectrogram;Decoding;Computational modeling;Codecs;Predictive models;Acoustics;Low latency communication;Speech synthesis;open tools;evaluation;computational requirements;TTS adaptation;text-to-speech;objective measures;listening test;Romanian}, |
| | doi={10.1109/ACCESS.2025.3637322}} |
| |
|
| | ``` |