| | ---
|
| | license: apache-2.0
|
| | language:
|
| | - ro
|
| | tags:
|
| | - text-to-speech
|
| | - Grad-TTS
|
| | - Diffusion
|
| | library_name: pytorch
|
| | datasets:
|
| | - SWARA-1.0
|
| | ---
|
| |
|
| | # Ro-Grad-TTS: Romanian Text-to-Speech
|
| |
|
| | Romanian adaptation of [Grad-TTS](https://arxiv.org/abs/2105.06337), trained on the [SWARA 1.0 dataset](https://speech.utcluj.ro/swarasc/).
|
| |
|
| | ## Quick Start
|
| |
|
| | This repository only contains the pretrained model weights for Romanian Grad-TTS. The actual package for Romanian TTS inference, including installation and usage instructions, is hosted on GitHub at [adrianstanea/Ro-Grad-TTS](https://github.com/adrianstanea/Ro-Grad-TTS.git).
|
| |
|
| | When using the Romanian Grad-TTS package, the weights from this repository will be automatically downloaded as needed. To install and run Romanian TTS inference, please follow the instructions in the main repository linked above.
|
| |
|
| | ## Details
|
| |
|
| | - **Architecture**: Grad-TTS (diffusion-based TTS)
|
| | - **Language**: Romanian
|
| | - **Phonemization**: Espeak-ng
|
| | - **Vocoder**: HiFi-GAN (universal v1)
|
| | - **Sample rate**: 22050 Hz
|
| | - **Training data**: SWARA 1.0 Romanian speech corpus
|
| |
|
| | ## Available Models
|
| |
|
| | ### Baseline Model
|
| |
|
| | | Model | Type | Description |
|
| | | --------- | -------- | ---------------------------------------------------- |
|
| | | **swara** | Baseline | Speaker-agnostic model trained on full SWARA dataset |
|
| |
|
| | ### Fine-tuned Speaker Models
|
| |
|
| | | Model | Speaker | Training Samples | Fine-tune Epochs | Use Case |
|
| | | ----------- | ------------ | ---------------- | ---------------- | -------------------------------- |
|
| | | **bas_10** | BAS (Female) | 10 samples | 100 | Few-shot learning / Low-resource |
|
| | | **bas_950** | BAS (Female) | 950 samples | 100 | Production-ready speaker |
|
| | | **sgs_10** | SGS (Male) | 10 samples | 100 | Few-shot learning / Low-resource |
|
| | | **sgs_950** | SGS (Male) | 950 samples | 100 | Production-ready speaker |
|
| |
|
| | **Vocoder**: Universal HiFi-GAN vocoder
|
| |
|
| | ## Repository Structure
|
| |
|
| | ```sh
|
| | adrianstanea/Ro-Grad-TTS/
|
| | βββ config.json # Model hyperparameters
|
| | βββ hifigan_config.json # Vocoder configuration
|
| | βββββ models/
|
| | βββ swara/
|
| | β βββ grad-tts-base-1000.pt # Baseline model
|
| | βββ bas/
|
| | β βββ grad-tts-bas-{10,950}_{15,50,100}.pt
|
| | βββ sgs/
|
| | β βββ grad-tts-sgs-{10,950}_{15,50,100}.pt
|
| | βββ vocoder/
|
| | βββ hifigan_univ_v1 # Universal HiFi-GAN
|
| | ```
|
| |
|
| | ## Citation
|
| |
|
| | If you use this Romanian adaptation in your research, please cite:
|
| |
|
| | ```bibtex
|
| | @ARTICLE{11269795,
|
| | author={RΔgman, Teodora and Bogdan StΓ’nea, Adrian and Cucu, Horia and Stan, Adriana},
|
| | journal={IEEE Access},
|
| | title={How Open Is Open TTS? A Practical Evaluation of Open Source TTS Tools},
|
| | year={2025},
|
| | volume={13},
|
| | number={},
|
| | pages={203415-203428},
|
| | keywords={Computer architecture;Training;Text to speech;Spectrogram;Decoding;Computational modeling;Codecs;Predictive models;Acoustics;Low latency communication;Speech synthesis;open tools;evaluation;computational requirements;TTS adaptation;text-to-speech;objective measures;listening test;Romanian},
|
| | doi={10.1109/ACCESS.2025.3637322}
|
| | }
|
| | ```
|
| |
|
| | ### Origianl Grad-TTS Citation
|
| |
|
| | ```bibtex
|
| | @article{popov2021grad,
|
| | title={Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech},
|
| | author={Popov, Vadim and Vovk, Ivan and Gogoryan, Vladimir and Sadekova, Tasnima and Kudinov, Mikhail},
|
| | journal={International Conference on Machine Learning},
|
| | year={2021}
|
| | }
|
| | ```
|
| |
|
| | ## References
|
| |
|
| | - [adrianstanea/Ro-Grad-TTS](https://github.com/adrianstanea/Ro-Grad-TTS.git) - Training, documentation, and research details
|
| | - [huawei-noah/Speech-Backbones](https://github.com/huawei-noah/Speech-Backbones) - Base architecture and paper
|
| |
|