File size: 4,073 Bytes
a6a634a 21f6b3c a6a634a 21f6b3c | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 | ---
license: apache-2.0
language:
- ro
tags:
- text-to-speech
- Grad-TTS
- Diffusion
library_name: pytorch
datasets:
- SWARA-1.0
---
# Ro-Grad-TTS: Romanian Text-to-Speech
Romanian adaptation of [Grad-TTS](https://arxiv.org/abs/2105.06337), trained on the [SWARA 1.0 dataset](https://speech.utcluj.ro/swarasc/).
## Quick Start
This repository only contains the pretrained model weights for Romanian Grad-TTS. The actual package for Romanian TTS inference, including installation and usage instructions, is hosted on GitHub at [adrianstanea/Ro-Grad-TTS](https://github.com/adrianstanea/Ro-Grad-TTS.git).
When using the Romanian Grad-TTS package, the weights from this repository will be automatically downloaded as needed. To install and run Romanian TTS inference, please follow the instructions in the main repository linked above.
## Details
- **Architecture**: Grad-TTS (diffusion-based TTS)
- **Language**: Romanian
- **Phonemization**: Espeak-ng
- **Vocoder**: HiFi-GAN (universal v1)
- **Sample rate**: 22050 Hz
- **Training data**: SWARA 1.0 Romanian speech corpus
## Available Models
### Baseline Model
| Model | Type | Description |
| --------- | -------- | ---------------------------------------------------- |
| **swara** | Baseline | Speaker-agnostic model trained on full SWARA dataset |
### Fine-tuned Speaker Models
| Model | Speaker | Training Samples | Fine-tune Epochs | Use Case |
| ----------- | ------------ | ---------------- | ---------------- | -------------------------------- |
| **bas_10** | BAS (Female) | 10 samples | 100 | Few-shot learning / Low-resource |
| **bas_950** | BAS (Female) | 950 samples | 100 | Production-ready speaker |
| **sgs_10** | SGS (Male) | 10 samples | 100 | Few-shot learning / Low-resource |
| **sgs_950** | SGS (Male) | 950 samples | 100 | Production-ready speaker |
**Vocoder**: Universal HiFi-GAN vocoder
## Repository Structure
```sh
adrianstanea/Ro-Grad-TTS/
βββ config.json # Model hyperparameters
βββ hifigan_config.json # Vocoder configuration
βββββ models/
βββ swara/
β βββ grad-tts-base-1000.pt # Baseline model
βββ bas/
β βββ grad-tts-bas-{10,950}_{15,50,100}.pt
βββ sgs/
β βββ grad-tts-sgs-{10,950}_{15,50,100}.pt
βββ vocoder/
βββ hifigan_univ_v1 # Universal HiFi-GAN
```
## Citation
If you use this Romanian adaptation in your research, please cite:
```bibtex
@ARTICLE{11269795,
author={RΔgman, Teodora and Bogdan StΓ’nea, Adrian and Cucu, Horia and Stan, Adriana},
journal={IEEE Access},
title={How Open Is Open TTS? A Practical Evaluation of Open Source TTS Tools},
year={2025},
volume={13},
number={},
pages={203415-203428},
keywords={Computer architecture;Training;Text to speech;Spectrogram;Decoding;Computational modeling;Codecs;Predictive models;Acoustics;Low latency communication;Speech synthesis;open tools;evaluation;computational requirements;TTS adaptation;text-to-speech;objective measures;listening test;Romanian},
doi={10.1109/ACCESS.2025.3637322}
}
```
### Origianl Grad-TTS Citation
```bibtex
@article{popov2021grad,
title={Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech},
author={Popov, Vadim and Vovk, Ivan and Gogoryan, Vladimir and Sadekova, Tasnima and Kudinov, Mikhail},
journal={International Conference on Machine Learning},
year={2021}
}
```
## References
- [adrianstanea/Ro-Grad-TTS](https://github.com/adrianstanea/Ro-Grad-TTS.git) - Training, documentation, and research details
- [huawei-noah/Speech-Backbones](https://github.com/huawei-noah/Speech-Backbones) - Base architecture and paper
|