Respair
/

RiFornet_Vocoder

Model card Files Files and versions

Respair commited on Feb 10, 2025

Commit

37e8218

·

verified ·

1 Parent(s): d41818a

Update README.md

Files changed (1) hide show

README.md +7 -8

README.md CHANGED Viewed

@@ -16,20 +16,19 @@ tags:
 This Vocoder, is a combination of [HiFTnet](https://github.com/yl4579/HiFTNet) and [Ringformer](https://github.com/seongho608/RingFormer). it supports Ring Attention, Conformer and Neural Source Filtering etc.
 This repository is experimental, expect some bugs and some hardcoded params.
-The default setting is 44.1khz - 128 Mel bins. if you want to change it to 24khz, copy the config from HiFTnet (make sure to copy its pitch extractor, both the model + the checkpoint.), then change 128 to 80 in LN-384 of the models.py. then uncomment the "multiscale_subband_cfg" for the 24khz version.
 Huge Thanks to [Johnathan Duering](https://github.com/duerig) for his help. I mostly implemented this based on his [STTS2 Fork](https://github.com/duerig/StyleTTS2/tree/main).
-**This is highly experimental, I have not conducted a full session training. I just tested that the loss goes down and the eval samples sound reasonable for ~10K steps of minimal training.**
-____________________________________________________________________________________
-**NOTE**: I have uploaded Two checkpoints so far. one is 24khz for HiFormer, trained for roughly 117K~ steps on LibriTTS (360 + 100) and 40 hours of other English datasets.
-the other checkpoint is HiFTNet, 44.1khz on more than 1100 Hours of Multilingual data, sourced privately by myself. it includes Arabic, Persian, Japanese, English and Russian. this one is trained for ~100K steps.
-Ideally both should be trained up to 1M steps, so I strongly recommend you to further fine-tune it on your own downstream task until I pre-train these for more steps.
 ## Pre-requisites
 1. Python >= 3.10
 2. Clone this repository:

 This Vocoder, is a combination of [HiFTnet](https://github.com/yl4579/HiFTNet) and [Ringformer](https://github.com/seongho608/RingFormer). it supports Ring Attention, Conformer and Neural Source Filtering etc.
 This repository is experimental, expect some bugs and some hardcoded params.
+The default setting is 44.1khz - 128 Mel bins. but I have provided the necessary script for the 24khz version in the LibriTTS checkpoint's folder.
 Huge Thanks to [Johnathan Duering](https://github.com/duerig) for his help. I mostly implemented this based on his [STTS2 Fork](https://github.com/duerig/StyleTTS2/tree/main).
+**NOTE**:
+There are Three checkpoints so far in this repository:
+  - HiFormer 24khz (trained for roughly 117K~ steps on LibriTTS (360 + 100) and 40 hours of other English datasets.)
+  - HiFormer 44.1khz (trained for roughly 280K~ steps on a Large (more than 1100 hours) private Multilingual dataset, covering Arabic, Persian, Japanese, English, Russian and also Singing voice in Chinese and Japanese with Quranic recitations in Arabic.
+  - HiFTNet 44.1khz (trained for ~100K steps, on a similar dataset to HiFormer 44.1khz, but slightly smaller and no singing voice.)
+Upda
 ## Pre-requisites
 1. Python >= 3.10
 2. Clone this repository: