Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech
Paper
β’ 2105.06337 β’ Published
Romanian adaptation of Grad-TTS, trained on the SWARA 1.0 dataset.
This repository only contains the pretrained model weights for Romanian Grad-TTS. The actual package for Romanian TTS inference, including installation and usage instructions, is hosted on GitHub at adrianstanea/Ro-Grad-TTS.
When using the Romanian Grad-TTS package, the weights from this repository will be automatically downloaded as needed. To install and run Romanian TTS inference, please follow the instructions in the main repository linked above.
| Model | Type | Description |
|---|---|---|
| swara | Baseline | Speaker-agnostic model trained on full SWARA dataset |
| Model | Speaker | Training Samples | Fine-tune Epochs | Use Case |
|---|---|---|---|---|
| bas_10 | BAS (Female) | 10 samples | 100 | Few-shot learning / Low-resource |
| bas_950 | BAS (Female) | 950 samples | 100 | Production-ready speaker |
| sgs_10 | SGS (Male) | 10 samples | 100 | Few-shot learning / Low-resource |
| sgs_950 | SGS (Male) | 950 samples | 100 | Production-ready speaker |
Vocoder: Universal HiFi-GAN vocoder
adrianstanea/Ro-Grad-TTS/
βββ config.json # Model hyperparameters
βββ hifigan_config.json # Vocoder configuration
βββββ models/
βββ swara/
β βββ grad-tts-base-1000.pt # Baseline model
βββ bas/
β βββ grad-tts-bas-{10,950}_{15,50,100}.pt
βββ sgs/
β βββ grad-tts-sgs-{10,950}_{15,50,100}.pt
βββ vocoder/
βββ hifigan_univ_v1 # Universal HiFi-GAN
If you use this Romanian adaptation in your research, please cite:
@ARTICLE{11269795,
author={RΔgman, Teodora and Bogdan StΓ’nea, Adrian and Cucu, Horia and Stan, Adriana},
journal={IEEE Access},
title={How Open Is Open TTS? A Practical Evaluation of Open Source TTS Tools},
year={2025},
volume={13},
number={},
pages={203415-203428},
keywords={Computer architecture;Training;Text to speech;Spectrogram;Decoding;Computational modeling;Codecs;Predictive models;Acoustics;Low latency communication;Speech synthesis;open tools;evaluation;computational requirements;TTS adaptation;text-to-speech;objective measures;listening test;Romanian},
doi={10.1109/ACCESS.2025.3637322}
}
@article{popov2021grad,
title={Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech},
author={Popov, Vadim and Vovk, Ivan and Gogoryan, Vladimir and Sadekova, Tasnima and Kudinov, Mikhail},
journal={International Conference on Machine Learning},
year={2021}
}