File size: 4,073 Bytes
a6a634a
 
21f6b3c
 
 
 
 
 
 
 
 
a6a634a
21f6b3c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
---

license: apache-2.0
language:
  - ro
tags:
  - text-to-speech
  - Grad-TTS
  - Diffusion
library_name: pytorch
datasets:
  - SWARA-1.0
---


# Ro-Grad-TTS: Romanian Text-to-Speech

Romanian adaptation of [Grad-TTS](https://arxiv.org/abs/2105.06337), trained on the [SWARA 1.0 dataset](https://speech.utcluj.ro/swarasc/).

## Quick Start

This repository only contains the pretrained model weights for Romanian Grad-TTS. The actual package for Romanian TTS inference, including installation and usage instructions, is hosted on GitHub at [adrianstanea/Ro-Grad-TTS](https://github.com/adrianstanea/Ro-Grad-TTS.git).

When using the Romanian Grad-TTS package, the weights from this repository will be automatically downloaded as needed. To install and run Romanian TTS inference, please follow the instructions in the main repository linked above.

## Details

- **Architecture**: Grad-TTS (diffusion-based TTS)
- **Language**: Romanian
- **Phonemization**: Espeak-ng
- **Vocoder**: HiFi-GAN (universal v1)
- **Sample rate**: 22050 Hz
- **Training data**: SWARA 1.0 Romanian speech corpus

## Available Models

### Baseline Model

| Model     | Type     | Description                                          |
| --------- | -------- | ---------------------------------------------------- |
| **swara** | Baseline | Speaker-agnostic model trained on full SWARA dataset |

### Fine-tuned Speaker Models

| Model       | Speaker      | Training Samples | Fine-tune Epochs | Use Case                         |
| ----------- | ------------ | ---------------- | ---------------- | -------------------------------- |
| **bas_10**  | BAS (Female) | 10 samples       | 100              | Few-shot learning / Low-resource |

| **bas_950** | BAS (Female) | 950 samples      | 100              | Production-ready speaker         |
| **sgs_10**  | SGS (Male)   | 10 samples       | 100              | Few-shot learning / Low-resource |

| **sgs_950** | SGS (Male)   | 950 samples      | 100              | Production-ready speaker         |

**Vocoder**: Universal HiFi-GAN vocoder

## Repository Structure

```sh

adrianstanea/Ro-Grad-TTS/

β”œβ”€β”€ config.json                                      # Model hyperparameters

β”œβ”€β”€ hifigan_config.json                              # Vocoder configuration

└──── models/

    β”œβ”€β”€ swara/

    β”‚   └── grad-tts-base-1000.pt                    # Baseline model

    β”œβ”€β”€ bas/

    β”‚   └── grad-tts-bas-{10,950}_{15,50,100}.pt

    β”œβ”€β”€ sgs/

    β”‚   └── grad-tts-sgs-{10,950}_{15,50,100}.pt

    └── vocoder/

        └── hifigan_univ_v1                          # Universal HiFi-GAN

```

## Citation

If you use this Romanian adaptation in your research, please cite:

```bibtex

@ARTICLE{11269795,

  author={Răgman, Teodora and Bogdan StÒnea, Adrian and Cucu, Horia and Stan, Adriana},

  journal={IEEE Access},

  title={How Open Is Open TTS? A Practical Evaluation of Open Source TTS Tools},

  year={2025},

  volume={13},

  number={},

  pages={203415-203428},

  keywords={Computer architecture;Training;Text to speech;Spectrogram;Decoding;Computational modeling;Codecs;Predictive models;Acoustics;Low latency communication;Speech synthesis;open tools;evaluation;computational requirements;TTS adaptation;text-to-speech;objective measures;listening test;Romanian},

  doi={10.1109/ACCESS.2025.3637322}

}

```

### Origianl Grad-TTS Citation

```bibtex

@article{popov2021grad,

  title={Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech},

  author={Popov, Vadim and Vovk, Ivan and Gogoryan, Vladimir and Sadekova, Tasnima and Kudinov, Mikhail},

  journal={International Conference on Machine Learning},

  year={2021}

}

```

## References

- [adrianstanea/Ro-Grad-TTS](https://github.com/adrianstanea/Ro-Grad-TTS.git) - Training, documentation, and research details
- [huawei-noah/Speech-Backbones](https://github.com/huawei-noah/Speech-Backbones) - Base architecture and paper