Respair commited on
Commit
37e8218
·
verified ·
1 Parent(s): d41818a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -8
README.md CHANGED
@@ -16,20 +16,19 @@ tags:
16
  This Vocoder, is a combination of [HiFTnet](https://github.com/yl4579/HiFTNet) and [Ringformer](https://github.com/seongho608/RingFormer). it supports Ring Attention, Conformer and Neural Source Filtering etc.
17
  This repository is experimental, expect some bugs and some hardcoded params.
18
 
19
- The default setting is 44.1khz - 128 Mel bins. if you want to change it to 24khz, copy the config from HiFTnet (make sure to copy its pitch extractor, both the model + the checkpoint.), then change 128 to 80 in LN-384 of the models.py. then uncomment the "multiscale_subband_cfg" for the 24khz version.
20
 
21
  Huge Thanks to [Johnathan Duering](https://github.com/duerig) for his help. I mostly implemented this based on his [STTS2 Fork](https://github.com/duerig/StyleTTS2/tree/main).
22
 
23
- **This is highly experimental, I have not conducted a full session training. I just tested that the loss goes down and the eval samples sound reasonable for ~10K steps of minimal training.**
24
 
25
- ____________________________________________________________________________________
26
 
 
27
 
28
- **NOTE**: I have uploaded Two checkpoints so far. one is 24khz for HiFormer, trained for roughly 117K~ steps on LibriTTS (360 + 100) and 40 hours of other English datasets.
29
-
30
- the other checkpoint is HiFTNet, 44.1khz on more than 1100 Hours of Multilingual data, sourced privately by myself. it includes Arabic, Persian, Japanese, English and Russian. this one is trained for ~100K steps.
31
- Ideally both should be trained up to 1M steps, so I strongly recommend you to further fine-tune it on your own downstream task until I pre-train these for more steps.
32
-
33
  ## Pre-requisites
34
  1. Python >= 3.10
35
  2. Clone this repository:
 
16
  This Vocoder, is a combination of [HiFTnet](https://github.com/yl4579/HiFTNet) and [Ringformer](https://github.com/seongho608/RingFormer). it supports Ring Attention, Conformer and Neural Source Filtering etc.
17
  This repository is experimental, expect some bugs and some hardcoded params.
18
 
19
+ The default setting is 44.1khz - 128 Mel bins. but I have provided the necessary script for the 24khz version in the LibriTTS checkpoint's folder.
20
 
21
  Huge Thanks to [Johnathan Duering](https://github.com/duerig) for his help. I mostly implemented this based on his [STTS2 Fork](https://github.com/duerig/StyleTTS2/tree/main).
22
 
 
23
 
 
24
 
25
+ **NOTE**:
26
 
27
+ There are Three checkpoints so far in this repository:
28
+ - HiFormer 24khz (trained for roughly 117K~ steps on LibriTTS (360 + 100) and 40 hours of other English datasets.)
29
+ - HiFormer 44.1khz (trained for roughly 280K~ steps on a Large (more than 1100 hours) private Multilingual dataset, covering Arabic, Persian, Japanese, English, Russian and also Singing voice in Chinese and Japanese with Quranic recitations in Arabic.
30
+ - HiFTNet 44.1khz (trained for ~100K steps, on a similar dataset to HiFormer 44.1khz, but slightly smaller and no singing voice.)
31
+ Upda
32
  ## Pre-requisites
33
  1. Python >= 3.10
34
  2. Clone this repository: