File size: 2,753 Bytes
dca69ed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
---
license: apache-2.0
language:
- ro
pipeline_tag: text-to-speech
---

# Romanian TTS Model (Finetuned)

This is a **FastPitch** model finetuned for the Romanian language. It was trained **(from scratch)** on the SWARA dataset and finetuned on specific speaker samples (BEA/SGS).

## Model Details
- **Architecture:** FastPitch
- **Language:** Romanian (ro)
- **Base Dataset:** The SWARA Speech Corpus (18k samples)
- **Base Model:** trained on 16 speakers (includes both male & female voices, balanced data). The base model components can be found in the 'swara' directory.
- **Finetuning:** finetuned on 2 speakers (bas and sgs). Their checkpoints can be found in the 'bas' and 'sgs' directories.
- **Sample rate:** 22050Hz

## Usage instructions
- **Included in the official repository of VITS:** https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/SpeechSynthesis/FastPitch
- **Our repository on finetuning various TTS models for the Romanian language:** https://gitlab.com/opentts_ragman/OpenTTS

## Citation

If you use this model, please cite the original FastPitch paper and the SWARA dataset:

```bibtex
@INPROCEEDINGS{fastpitch,
  author={Łańcucki, Adrian},
  booktitle={Proc. of ICASSP}, 
  title={{Fastpitch: Parallel Text-to-Speech with Pitch Prediction}}, 
  year={2021},
  volume={},
  number={},
  pages={6588-6592},
  keywords={Frequency synthesizers;Frequency modulation;Conferences;Semantics;Predictive models;Real-time systems;Acoustics;text-to-speech;speech synthesis;fundamental frequency},
  doi={10.1109/ICASSP39728.2021.9413889}}

@inproceedings{stan_sped2017,
  author = {Stan, Adriana and Dinescu, Florina and Tiple, Cristina and Meza, Serban and Orza, Bogdan and Chirila, Magdalena and Giurgiu, Mircea},
  title = {{The SWARA Speech Corpus: A Large Parallel Romanian Read Speech Dataset}},
  year = 2017,
  address = {Bucharest, Romania},
  booktitle = {{Proceedings of the 9th Conference on Speech Technology and Human-Computer Dialogue (SpeD)}},
  month = {July, 6-9},
}
```

If you use this specific finetuned checkpoint in your work, please cite it as follows:

```bibtex
@ARTICLE{11269795,
  author={Răgman, Teodora and Bogdan Stânea, Adrian and Cucu, Horia and Stan, Adriana},
  journal={IEEE Access}, 
  title={How Open Is Open TTS? A Practical Evaluation of Open Source TTS Tools}, 
  year={2025},
  volume={13},
  number={},
  pages={203415-203428},
  keywords={Computer architecture;Training;Text to speech;Spectrogram;Decoding;Computational modeling;Codecs;Predictive models;Acoustics;Low latency communication;Speech synthesis;open tools;evaluation;computational requirements;TTS adaptation;text-to-speech;objective measures;listening test;Romanian},
  doi={10.1109/ACCESS.2025.3637322}}

```