TeodoraR
/

Ro_FastPitch

Model card Files Files and versions

Ro_FastPitch / README.md

TeodoraR's picture

create readme file

dca69ed verified about 1 month ago

|

history blame contribute delete

2.75 kB

	---
	license: apache-2.0
	language:
	- ro
	pipeline_tag: text-to-speech
	---

	# Romanian TTS Model (Finetuned)

	This is a FastPitch model finetuned for the Romanian language. It was trained (from scratch) on the SWARA dataset and finetuned on specific speaker samples (BEA/SGS).

	## Model Details
	- Architecture: FastPitch
	- Language: Romanian (ro)
	- Base Dataset: The SWARA Speech Corpus (18k samples)
	- Base Model: trained on 16 speakers (includes both male & female voices, balanced data). The base model components can be found in the 'swara' directory.
	- Finetuning: finetuned on 2 speakers (bas and sgs). Their checkpoints can be found in the 'bas' and 'sgs' directories.
	- Sample rate: 22050Hz

	## Usage instructions
	- Included in the official repository of VITS: https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/SpeechSynthesis/FastPitch
	- Our repository on finetuning various TTS models for the Romanian language: https://gitlab.com/opentts_ragman/OpenTTS

	## Citation

	If you use this model, please cite the original FastPitch paper and the SWARA dataset:

	```bibtex
	@INPROCEEDINGS{fastpitch,
	author={Łańcucki, Adrian},
	booktitle={Proc. of ICASSP},
	title={{Fastpitch: Parallel Text-to-Speech with Pitch Prediction}},
	year={2021},
	volume={},
	number={},
	pages={6588-6592},
	keywords={Frequency synthesizers;Frequency modulation;Conferences;Semantics;Predictive models;Real-time systems;Acoustics;text-to-speech;speech synthesis;fundamental frequency},
	doi={10.1109/ICASSP39728.2021.9413889}}

	@inproceedings{stan_sped2017,
	author = {Stan, Adriana and Dinescu, Florina and Tiple, Cristina and Meza, Serban and Orza, Bogdan and Chirila, Magdalena and Giurgiu, Mircea},
	title = {{The SWARA Speech Corpus: A Large Parallel Romanian Read Speech Dataset}},
	year = 2017,
	address = {Bucharest, Romania},
	booktitle = {{Proceedings of the 9th Conference on Speech Technology and Human-Computer Dialogue (SpeD)}},
	month = {July, 6-9},
	}
	```

	If you use this specific finetuned checkpoint in your work, please cite it as follows:

	```bibtex
	@ARTICLE{11269795,
	author={Răgman, Teodora and Bogdan Stânea, Adrian and Cucu, Horia and Stan, Adriana},
	journal={IEEE Access},
	title={How Open Is Open TTS? A Practical Evaluation of Open Source TTS Tools},
	year={2025},
	volume={13},
	number={},
	pages={203415-203428},
	keywords={Computer architecture;Training;Text to speech;Spectrogram;Decoding;Computational modeling;Codecs;Predictive models;Acoustics;Low latency communication;Speech synthesis;open tools;evaluation;computational requirements;TTS adaptation;text-to-speech;objective measures;listening test;Romanian},
	doi={10.1109/ACCESS.2025.3637322}}

	```