adrianstanea commited on
Commit
21f6b3c
·
1 Parent(s): a6a634a

Add model checkpoints and config files

Browse files

Signed-off-by: adrianstanea <adrianstanea1@gmail.com>

.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ models/vocoder/hifigan_univ_v1 filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,3 +1,101 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ language:
4
+ - ro
5
+ tags:
6
+ - text-to-speech
7
+ - Grad-TTS
8
+ - Diffusion
9
+ library_name: pytorch
10
+ datasets:
11
+ - SWARA-1.0
12
  ---
13
+
14
+ # Ro-Grad-TTS: Romanian Text-to-Speech
15
+
16
+ Romanian adaptation of [Grad-TTS](https://arxiv.org/abs/2105.06337), trained on the [SWARA 1.0 dataset](https://speech.utcluj.ro/swarasc/).
17
+
18
+ ## Quick Start
19
+
20
+ This repository only contains the pretrained model weights for Romanian Grad-TTS. The actual package for Romanian TTS inference, including installation and usage instructions, is hosted on GitHub at [adrianstanea/Ro-Grad-TTS](https://github.com/adrianstanea/Ro-Grad-TTS.git).
21
+
22
+ When using the Romanian Grad-TTS package, the weights from this repository will be automatically downloaded as needed. To install and run Romanian TTS inference, please follow the instructions in the main repository linked above.
23
+
24
+ ## Details
25
+
26
+ - **Architecture**: Grad-TTS (diffusion-based TTS)
27
+ - **Language**: Romanian
28
+ - **Phonemization**: Espeak-ng
29
+ - **Vocoder**: HiFi-GAN (universal v1)
30
+ - **Sample rate**: 22050 Hz
31
+ - **Training data**: SWARA 1.0 Romanian speech corpus
32
+
33
+ ## Available Models
34
+
35
+ ### Baseline Model
36
+
37
+ | Model | Type | Description |
38
+ | --------- | -------- | ---------------------------------------------------- |
39
+ | **swara** | Baseline | Speaker-agnostic model trained on full SWARA dataset |
40
+
41
+ ### Fine-tuned Speaker Models
42
+
43
+ | Model | Speaker | Training Samples | Fine-tune Epochs | Use Case |
44
+ | ----------- | ------------ | ---------------- | ---------------- | -------------------------------- |
45
+ | **bas_10** | BAS (Female) | 10 samples | 100 | Few-shot learning / Low-resource |
46
+ | **bas_950** | BAS (Female) | 950 samples | 100 | Production-ready speaker |
47
+ | **sgs_10** | SGS (Male) | 10 samples | 100 | Few-shot learning / Low-resource |
48
+ | **sgs_950** | SGS (Male) | 950 samples | 100 | Production-ready speaker |
49
+
50
+ **Vocoder**: Universal HiFi-GAN vocoder
51
+
52
+ ## Repository Structure
53
+
54
+ ```sh
55
+ adrianstanea/Ro-Grad-TTS/
56
+ ├── config.json # Model hyperparameters
57
+ ├── hifigan_config.json # Vocoder configuration
58
+ └──── models/
59
+ ├── swara/
60
+ │ └── grad-tts-base-1000.pt # Baseline model
61
+ ├── bas/
62
+ │ └── grad-tts-bas-{10,950}_{15,50,100}.pt
63
+ ├── sgs/
64
+ │ └── grad-tts-sgs-{10,950}_{15,50,100}.pt
65
+ └── vocoder/
66
+ └── hifigan_univ_v1 # Universal HiFi-GAN
67
+ ```
68
+
69
+ ## Citation
70
+
71
+ If you use this Romanian adaptation in your research, please cite:
72
+
73
+ ```bibtex
74
+ @ARTICLE{11269795,
75
+ author={Răgman, Teodora and Bogdan Stânea, Adrian and Cucu, Horia and Stan, Adriana},
76
+ journal={IEEE Access},
77
+ title={How Open Is Open TTS? A Practical Evaluation of Open Source TTS Tools},
78
+ year={2025},
79
+ volume={13},
80
+ number={},
81
+ pages={203415-203428},
82
+ keywords={Computer architecture;Training;Text to speech;Spectrogram;Decoding;Computational modeling;Codecs;Predictive models;Acoustics;Low latency communication;Speech synthesis;open tools;evaluation;computational requirements;TTS adaptation;text-to-speech;objective measures;listening test;Romanian},
83
+ doi={10.1109/ACCESS.2025.3637322}
84
+ }
85
+ ```
86
+
87
+ ### Origianl Grad-TTS Citation
88
+
89
+ ```bibtex
90
+ @article{popov2021grad,
91
+ title={Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech},
92
+ author={Popov, Vadim and Vovk, Ivan and Gogoryan, Vladimir and Sadekova, Tasnima and Kudinov, Mikhail},
93
+ journal={International Conference on Machine Learning},
94
+ year={2021}
95
+ }
96
+ ```
97
+
98
+ ## References
99
+
100
+ - [adrianstanea/Ro-Grad-TTS](https://github.com/adrianstanea/Ro-Grad-TTS.git) - Training, documentation, and research details
101
+ - [huawei-noah/Speech-Backbones](https://github.com/huawei-noah/Speech-Backbones) - Base architecture and paper
config.json ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "grad-tts",
3
+ "language": "ro",
4
+ "n_spks": 1,
5
+ "spk_emb_dim": 64,
6
+ "n_enc_channels": 192,
7
+ "filter_channels": 768,
8
+ "filter_channels_dp": 256,
9
+ "n_heads": 2,
10
+ "n_enc_layers": 6,
11
+ "enc_kernel": 3,
12
+ "enc_dropout": 0.1,
13
+ "window_size": 4,
14
+ "n_feats": 80,
15
+ "dec_dim": 64,
16
+ "beta_min": 0.05,
17
+ "beta_max": 20.0,
18
+ "pe_scale": 1000,
19
+ "sample_rate": 22050,
20
+ "add_blank": true
21
+ }
hifigan_config.json ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "resblock": "1",
3
+ "num_gpus": 0,
4
+ "batch_size": 16,
5
+ "learning_rate": 0.0004,
6
+ "adam_b1": 0.8,
7
+ "adam_b2": 0.99,
8
+ "lr_decay": 0.999,
9
+ "seed": 1234,
10
+
11
+ "upsample_rates": [8,8,2,2],
12
+ "upsample_kernel_sizes": [16,16,4,4],
13
+ "upsample_initial_channel": 512,
14
+ "resblock_kernel_sizes": [3,7,11],
15
+ "resblock_dilation_sizes": [[1,3,5], [1,3,5], [1,3,5]],
16
+ "resblock_initial_channel": 256,
17
+
18
+ "segment_size": 8192,
19
+ "num_mels": 80,
20
+ "num_freq": 1025,
21
+ "n_fft": 1024,
22
+ "hop_size": 256,
23
+ "win_size": 1024,
24
+
25
+ "sampling_rate": 22050,
26
+
27
+ "fmin": 0,
28
+ "fmax": 8000,
29
+ "fmax_loss": null,
30
+
31
+ "num_workers": 4,
32
+
33
+ "dist_config": {
34
+ "dist_backend": "nccl",
35
+ "dist_url": "tcp://localhost:54321",
36
+ "world_size": 1
37
+ }
38
+ }
models/bas/grad-tts-bas-10_100.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2496c1451640dbf50d247f4ffc520fbb768bf5d9512f3d0875e1b0431f7625c7
3
+ size 59484571
models/bas/grad-tts-bas-10_15.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:56ea54d5cde11cff79d57c34af9d3407b1cea294ceadfe5cb9949c5caba64025
3
+ size 59484571
models/bas/grad-tts-bas-10_50.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:df6401ee7f066b7e8b83e5185030d55a29bca9ae87897dcb2b5ec41c64ef001c
3
+ size 59484571
models/bas/grad-tts-bas-950_100.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6bf8faa190b2f5fa361581b365c471327c237b1b818a390b5b7016760ad607a6
3
+ size 59484571
models/bas/grad-tts-bas-950_15.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9f9eeb8d028c84c14b20bba26c107475bef35e3cee33fec4f33d096713c52bb4
3
+ size 59484571
models/bas/grad-tts-bas-950_50.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8c0e96fe8fe2ec1f6a8f0f88f05b2923671ea56bde8cfb34306552d8db48b386
3
+ size 59484571
models/sgs/grad-tts-sgs-10_100.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3100c5f1fb2e3f2d94790e5b27b22ea990ecf01f5db694ab279eaeb2fd874e29
3
+ size 59484571
models/sgs/grad-tts-sgs-10_15.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:30b3bdaf8595f4c04f5f939839126c3e134b557a55842605770b8d4ac1b1f1d4
3
+ size 59484571
models/sgs/grad-tts-sgs-10_50.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5833afde853640b228b90ad23d006f200e7eaccab288916dff5b21864ed10de6
3
+ size 59484571
models/sgs/grad-tts-sgs-950_100.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8fe8ef94739bde87025a26c87874ff46beccad144b6696b61c6a976f8c69e919
3
+ size 59484571
models/sgs/grad-tts-sgs-950_15.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:664f6d77470501f2fbaa74477f1e828b94fe4d59f265ed6ff322c6865c55fcac
3
+ size 59484571
models/sgs/grad-tts-sgs-950_50.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9462ab4808bc7a15fd19ebe3db14084f5b00a7302c888c9b0505de531023bec4
3
+ size 59484571
models/swara/grad-tts-base-1000.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:141842ea3fc006215aa66234c5ef59b333ccd9c501f4288c3ff743f4a35c5d43
3
+ size 59484571
models/vocoder/hifigan_univ_v1 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:771eaf4876485a35e25577563d390c262e23c2421e4a8c929eacfde34a5b7a60
3
+ size 55788858