Update README.md
Browse files
README.md
CHANGED
|
@@ -16,20 +16,19 @@ tags:
|
|
| 16 |
This Vocoder, is a combination of [HiFTnet](https://github.com/yl4579/HiFTNet) and [Ringformer](https://github.com/seongho608/RingFormer). it supports Ring Attention, Conformer and Neural Source Filtering etc.
|
| 17 |
This repository is experimental, expect some bugs and some hardcoded params.
|
| 18 |
|
| 19 |
-
The default setting is 44.1khz - 128 Mel bins.
|
| 20 |
|
| 21 |
Huge Thanks to [Johnathan Duering](https://github.com/duerig) for his help. I mostly implemented this based on his [STTS2 Fork](https://github.com/duerig/StyleTTS2/tree/main).
|
| 22 |
|
| 23 |
-
**This is highly experimental, I have not conducted a full session training. I just tested that the loss goes down and the eval samples sound reasonable for ~10K steps of minimal training.**
|
| 24 |
|
| 25 |
-
____________________________________________________________________________________
|
| 26 |
|
|
|
|
| 27 |
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
|
| 33 |
## Pre-requisites
|
| 34 |
1. Python >= 3.10
|
| 35 |
2. Clone this repository:
|
|
|
|
| 16 |
This Vocoder, is a combination of [HiFTnet](https://github.com/yl4579/HiFTNet) and [Ringformer](https://github.com/seongho608/RingFormer). it supports Ring Attention, Conformer and Neural Source Filtering etc.
|
| 17 |
This repository is experimental, expect some bugs and some hardcoded params.
|
| 18 |
|
| 19 |
+
The default setting is 44.1khz - 128 Mel bins. but I have provided the necessary script for the 24khz version in the LibriTTS checkpoint's folder.
|
| 20 |
|
| 21 |
Huge Thanks to [Johnathan Duering](https://github.com/duerig) for his help. I mostly implemented this based on his [STTS2 Fork](https://github.com/duerig/StyleTTS2/tree/main).
|
| 22 |
|
|
|
|
| 23 |
|
|
|
|
| 24 |
|
| 25 |
+
**NOTE**:
|
| 26 |
|
| 27 |
+
There are Three checkpoints so far in this repository:
|
| 28 |
+
- HiFormer 24khz (trained for roughly 117K~ steps on LibriTTS (360 + 100) and 40 hours of other English datasets.)
|
| 29 |
+
- HiFormer 44.1khz (trained for roughly 280K~ steps on a Large (more than 1100 hours) private Multilingual dataset, covering Arabic, Persian, Japanese, English, Russian and also Singing voice in Chinese and Japanese with Quranic recitations in Arabic.
|
| 30 |
+
- HiFTNet 44.1khz (trained for ~100K steps, on a similar dataset to HiFormer 44.1khz, but slightly smaller and no singing voice.)
|
| 31 |
+
Upda
|
| 32 |
## Pre-requisites
|
| 33 |
1. Python >= 3.10
|
| 34 |
2. Clone this repository:
|