vspeech commited on
Commit
350820b
·
verified ·
1 Parent(s): 8d687f6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +42 -3
README.md CHANGED
@@ -1,3 +1,42 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ pipeline_tag: audio-to-audio
4
+ ---
5
+
6
+ # SCNet: Enhancing GAN-based Speech Generation with Subband Condition Network and Magnitude-aware Phase Loss
7
+
8
+ Recent speech generation has been predominantly driven by GAN-based networks aimed at high-quality waveform synthesis from mel-spectrograms. However, these methods often operate as black-box models, leading to the loss of inherent spectral information. In this work, we propose SCNet, a GAN-based vocoder augmented with a Subband Condition Network to address this issue. Specifically, SCNet leverages a subband signal predicted by a lightweight condition network as prior knowledge. This subband signal is then transformed via STFT to obtain Fourier coefficients, which are integrated into the backbone for the enhanced reconstruction. Additionally, to mitigate the phase wrapping, we introduce a magnitude-aware phase loss that computes instantaneous phase errors weighted by the corresponding magnitude, emphasizing regions with higher energy. Experimental results demonstrate that SCNet achieves superior performance in both objective and subjective evaluations for high-quality speech generation.
9
+
10
+ ## Pre-requisites
11
+ 1. Python >= 3.10
12
+ 2. Clone this repository:
13
+ ```bash
14
+ git clone https://anonymous.4open.science/r/SCNet-94D1.git
15
+ cd SCNet
16
+ ```
17
+ 3. Install python requirements:
18
+ ```bash
19
+ pip install -r requirements.txt
20
+ ```
21
+
22
+ ## Pre-Trained Models
23
+ You can download the pre-trained LibriTTS model [here](https://drive.google.com/drive/folders/1Dn8f2PUodjME_SsfkJ8SGEtusXJNvkZI?usp=sharing) and copy to cp\_scnet directory.
24
+ Or download from huggingface
25
+ ```bash
26
+ huggingface-cli download vspeech/SCNet --local-dir cp_scnet
27
+ ```
28
+
29
+ ## Inference
30
+ Please refer to the inference.py for details.
31
+ ```bash
32
+ python inference.py
33
+ --input_wavs_dir /path/to/your/input_wav \
34
+ --checkpoint_file /path/to/your/cp_scnet/model \
35
+ --output_dir /path/to/your/output_wav
36
+ ```
37
+
38
+ ## References
39
+ - [rishikksh20/iSTFTNet-pytorch](https://github.com/rishikksh20/iSTFTNet-pytorch)
40
+ - [yl4579/HiFTNet](https://github.com/yl4579/HiFTNet)
41
+ - [gemelo-ai/vocos](https://github.com/gemelo-ai/vocos)
42
+ - [NVIDIA/BigVGAN](https://github.com/NVIDIA/BigVGAN)