adrianstanea commited on Feb 28

Commit

21f6b3c

1 Parent(s): a6a634a

Add model checkpoints and config files

Browse files

Signed-off-by: adrianstanea <adrianstanea1@gmail.com>

Files changed (18) hide show

.gitattributes +1 -0
README.md +98 -0
config.json +21 -0
hifigan_config.json +38 -0
models/bas/grad-tts-bas-10_100.pt +3 -0
models/bas/grad-tts-bas-10_15.pt +3 -0
models/bas/grad-tts-bas-10_50.pt +3 -0
models/bas/grad-tts-bas-950_100.pt +3 -0
models/bas/grad-tts-bas-950_15.pt +3 -0
models/bas/grad-tts-bas-950_50.pt +3 -0
models/sgs/grad-tts-sgs-10_100.pt +3 -0
models/sgs/grad-tts-sgs-10_15.pt +3 -0
models/sgs/grad-tts-sgs-10_50.pt +3 -0
models/sgs/grad-tts-sgs-950_100.pt +3 -0
models/sgs/grad-tts-sgs-950_15.pt +3 -0
models/sgs/grad-tts-sgs-950_50.pt +3 -0
models/swara/grad-tts-base-1000.pt +3 -0
models/vocoder/hifigan_univ_v1 +3 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+models/vocoder/hifigan_univ_v1 filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -1,3 +1,101 @@
 ---
 license: apache-2.0
 ---

 ---
 license: apache-2.0
+language:
+  - ro
+tags:
+  - text-to-speech
+  - Grad-TTS
+  - Diffusion
+library_name: pytorch
+datasets:
+  - SWARA-1.0
 ---
+# Ro-Grad-TTS: Romanian Text-to-Speech
+Romanian adaptation of [Grad-TTS](https://arxiv.org/abs/2105.06337), trained on the [SWARA 1.0 dataset](https://speech.utcluj.ro/swarasc/).
+## Quick Start
+This repository only contains the pretrained model weights for Romanian Grad-TTS. The actual package for Romanian TTS inference, including installation and usage instructions, is hosted on GitHub at [adrianstanea/Ro-Grad-TTS](https://github.com/adrianstanea/Ro-Grad-TTS.git).
+When using the Romanian Grad-TTS package, the weights from this repository will be automatically downloaded as needed. To install and run Romanian TTS inference, please follow the instructions in the main repository linked above.
+## Details
+- **Architecture**: Grad-TTS (diffusion-based TTS)
+- **Language**: Romanian
+- **Phonemization**: Espeak-ng
+- **Vocoder**: HiFi-GAN (universal v1)
+- **Sample rate**: 22050 Hz
+- **Training data**: SWARA 1.0 Romanian speech corpus
+## Available Models
+### Baseline Model
+| Model     | Type     | Description                                          |
+| --------- | -------- | ---------------------------------------------------- |
+| **swara** | Baseline | Speaker-agnostic model trained on full SWARA dataset |
+### Fine-tuned Speaker Models
+| Model       | Speaker      | Training Samples | Fine-tune Epochs | Use Case                         |
+| ----------- | ------------ | ---------------- | ---------------- | -------------------------------- |
+| **bas_10**  | BAS (Female) | 10 samples       | 100              | Few-shot learning / Low-resource |
+| **bas_950** | BAS (Female) | 950 samples      | 100              | Production-ready speaker         |
+| **sgs_10**  | SGS (Male)   | 10 samples       | 100              | Few-shot learning / Low-resource |
+| **sgs_950** | SGS (Male)   | 950 samples      | 100              | Production-ready speaker         |
+**Vocoder**: Universal HiFi-GAN vocoder
+## Repository Structure
+```sh
+adrianstanea/Ro-Grad-TTS/
+├── config.json                                      # Model hyperparameters
+├── hifigan_config.json                              # Vocoder configuration
+└──── models/
+    ├── swara/
+    │   └── grad-tts-base-1000.pt                    # Baseline model
+    ├── bas/
+    │   └── grad-tts-bas-{10,950}_{15,50,100}.pt
+    ├── sgs/
+    │   └── grad-tts-sgs-{10,950}_{15,50,100}.pt
+    └── vocoder/
+        └── hifigan_univ_v1                          # Universal HiFi-GAN
+```
+## Citation
+If you use this Romanian adaptation in your research, please cite:
+```bibtex
+@ARTICLE{11269795,
+  author={Răgman, Teodora and Bogdan Stânea, Adrian and Cucu, Horia and Stan, Adriana},
+  journal={IEEE Access},
+  title={How Open Is Open TTS? A Practical Evaluation of Open Source TTS Tools},
+  year={2025},
+  volume={13},
+  number={},
+  pages={203415-203428},
+  keywords={Computer architecture;Training;Text to speech;Spectrogram;Decoding;Computational modeling;Codecs;Predictive models;Acoustics;Low latency communication;Speech synthesis;open tools;evaluation;computational requirements;TTS adaptation;text-to-speech;objective measures;listening test;Romanian},
+  doi={10.1109/ACCESS.2025.3637322}
+}
+```
+### Origianl Grad-TTS Citation
+```bibtex
+@article{popov2021grad,
+  title={Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech},
+  author={Popov, Vadim and Vovk, Ivan and Gogoryan, Vladimir and Sadekova, Tasnima and Kudinov, Mikhail},
+  journal={International Conference on Machine Learning},
+  year={2021}
+}
+```
+## References
+- [adrianstanea/Ro-Grad-TTS](https://github.com/adrianstanea/Ro-Grad-TTS.git) - Training, documentation, and research details
+- [huawei-noah/Speech-Backbones](https://github.com/huawei-noah/Speech-Backbones) - Base architecture and paper

config.json ADDED Viewed

	@@ -0,0 +1,21 @@

+{
+    "model_type": "grad-tts",
+    "language": "ro",
+    "n_spks": 1,
+    "spk_emb_dim": 64,
+    "n_enc_channels": 192,
+    "filter_channels": 768,
+    "filter_channels_dp": 256,
+    "n_heads": 2,
+    "n_enc_layers": 6,
+    "enc_kernel": 3,
+    "enc_dropout": 0.1,
+    "window_size": 4,
+    "n_feats": 80,
+    "dec_dim": 64,
+    "beta_min": 0.05,
+    "beta_max": 20.0,
+    "pe_scale": 1000,
+    "sample_rate": 22050,
+    "add_blank": true
+}

hifigan_config.json ADDED Viewed

	@@ -0,0 +1,38 @@

+{
+    "resblock": "1",
+    "num_gpus": 0,
+    "batch_size": 16,
+    "learning_rate": 0.0004,
+    "adam_b1": 0.8,
+    "adam_b2": 0.99,
+    "lr_decay": 0.999,
+    "seed": 1234,
+    "upsample_rates": [8,8,2,2],
+    "upsample_kernel_sizes": [16,16,4,4],
+    "upsample_initial_channel": 512,
+    "resblock_kernel_sizes": [3,7,11],
+    "resblock_dilation_sizes": [[1,3,5], [1,3,5], [1,3,5]],
+    "resblock_initial_channel": 256,
+    "segment_size": 8192,
+    "num_mels": 80,
+    "num_freq": 1025,
+    "n_fft": 1024,
+    "hop_size": 256,
+    "win_size": 1024,
+    "sampling_rate": 22050,
+    "fmin": 0,
+    "fmax": 8000,
+    "fmax_loss": null,
+    "num_workers": 4,
+    "dist_config": {
+        "dist_backend": "nccl",
+        "dist_url": "tcp://localhost:54321",
+        "world_size": 1
+    }
+}

models/bas/grad-tts-bas-10_100.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2496c1451640dbf50d247f4ffc520fbb768bf5d9512f3d0875e1b0431f7625c7
+size 59484571

models/bas/grad-tts-bas-10_15.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:56ea54d5cde11cff79d57c34af9d3407b1cea294ceadfe5cb9949c5caba64025
+size 59484571

models/bas/grad-tts-bas-10_50.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:df6401ee7f066b7e8b83e5185030d55a29bca9ae87897dcb2b5ec41c64ef001c
+size 59484571

models/bas/grad-tts-bas-950_100.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6bf8faa190b2f5fa361581b365c471327c237b1b818a390b5b7016760ad607a6
+size 59484571

models/bas/grad-tts-bas-950_15.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9f9eeb8d028c84c14b20bba26c107475bef35e3cee33fec4f33d096713c52bb4
+size 59484571

models/bas/grad-tts-bas-950_50.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8c0e96fe8fe2ec1f6a8f0f88f05b2923671ea56bde8cfb34306552d8db48b386
+size 59484571

models/sgs/grad-tts-sgs-10_100.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3100c5f1fb2e3f2d94790e5b27b22ea990ecf01f5db694ab279eaeb2fd874e29
+size 59484571

models/sgs/grad-tts-sgs-10_15.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:30b3bdaf8595f4c04f5f939839126c3e134b557a55842605770b8d4ac1b1f1d4
+size 59484571

models/sgs/grad-tts-sgs-10_50.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5833afde853640b228b90ad23d006f200e7eaccab288916dff5b21864ed10de6
+size 59484571

models/sgs/grad-tts-sgs-950_100.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8fe8ef94739bde87025a26c87874ff46beccad144b6696b61c6a976f8c69e919
+size 59484571

models/sgs/grad-tts-sgs-950_15.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:664f6d77470501f2fbaa74477f1e828b94fe4d59f265ed6ff322c6865c55fcac
+size 59484571

models/sgs/grad-tts-sgs-950_50.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9462ab4808bc7a15fd19ebe3db14084f5b00a7302c888c9b0505de531023bec4
+size 59484571

models/swara/grad-tts-base-1000.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:141842ea3fc006215aa66234c5ef59b333ccd9c501f4288c3ff743f4a35c5d43
+size 59484571

models/vocoder/hifigan_univ_v1 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:771eaf4876485a35e25577563d390c262e23c2421e4a8c929eacfde34a5b7a60
+size 55788858