jameson512 commited on
Commit
d4d8960
·
1 Parent(s): fec582c
Files changed (2) hide show
  1. README - 副本.md +190 -0
  2. README.md +3 -190
README - 副本.md ADDED
@@ -0,0 +1,190 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <div align="center">
2
+
3
+ <h1>GPT-SoVITS-WebUI</h1>
4
+ A Powerful Few-shot Voice Conversion and Text-to-Speech WebUI.<br><br>
5
+
6
+ [![madewithlove](https://img.shields.io/badge/made_with-%E2%9D%A4-red?style=for-the-badge&labelColor=orange
7
+ )](https://github.com/RVC-Boss/GPT-SoVITS)
8
+
9
+ <img src="https://counter.seku.su/cmoe?name=gptsovits&theme=r34" /><br>
10
+
11
+ [![Licence](https://img.shields.io/badge/LICENSE-MIT-green.svg?style=for-the-badge)](https://github.com/RVC-Boss/GPT-SoVITS/blob/main/LICENSE)
12
+ [![Huggingface](https://img.shields.io/badge/🤗%20-Spaces-yellow.svg?style=for-the-badge)](https://huggingface.co/lj1995/GPT-SoVITS/tree/main)
13
+
14
+
15
+ [**English**](./README.md) | [**中文简体**](./docs/cn/README.md) | [**日本語**](./docs/ja/README.md)
16
+
17
+ </div>
18
+
19
+ ------
20
+
21
+
22
+
23
+ > Check out our [demo video](https://www.bilibili.com/video/BV12g4y1m7Uw) here!
24
+
25
+ https://github.com/RVC-Boss/GPT-SoVITS/assets/129054828/05bee1fa-bdd8-4d85-9350-80c060ab47fb
26
+
27
+ ## Features:
28
+ 1. **Zero-shot TTS:** Input a 5-second vocal sample and experience instant text-to-speech conversion.
29
+
30
+ 2. **Few-shot TTS:** Fine-tune the model with just 1 minute of training data for improved voice similarity and realism.
31
+
32
+ 3. **Cross-lingual Support:** Inference in languages different from the training dataset, currently supporting English, Japanese, and Chinese.
33
+
34
+ 4. **WebUI Tools:** Integrated tools include voice accompaniment separation, automatic training set segmentation, Chinese ASR, and text labeling, assisting beginners in creating training datasets and GPT/SoVITS models.
35
+
36
+ ## Environment Preparation
37
+
38
+ If you are a Windows user (tested with win>=10) you can install directly via the prezip. Just download the [prezip](https://huggingface.co/lj1995/GPT-SoVITS-windows-package/resolve/main/GPT-SoVITS-beta.7z?download=true), unzip it and double-click go-webui.bat to start GPT-SoVITS-WebUI.
39
+
40
+ ### Tested Environments
41
+
42
+ - Python 3.9, PyTorch 2.0.1, CUDA 11
43
+ - Python 3.10.13, PyTorch 2.1.2, CUDA 12.3
44
+
45
+ _Note: numba==0.56.4 require py<3.11_
46
+
47
+ ### Quick Install with Conda
48
+
49
+ ```bash
50
+ conda create -n GPTSoVits python=3.9
51
+ conda activate GPTSoVits
52
+ bash install.sh
53
+ ```
54
+ ### Install Manually
55
+
56
+ #### Pip Packages
57
+
58
+ ```bash
59
+ pip install torch numpy scipy tensorboard librosa==0.9.2 numba==0.56.4 pytorch-lightning gradio==3.14.0 ffmpeg-python onnxruntime tqdm cn2an pypinyin pyopenjtalk g2p_en chardet transformers jieba_fast
60
+ ```
61
+
62
+ #### Additional Requirements
63
+
64
+ If you need Chinese ASR (supported by FunASR), install:
65
+
66
+ ```bash
67
+ pip install modelscope torchaudio sentencepiece funasr
68
+ ```
69
+
70
+ #### FFmpeg
71
+
72
+ ##### Conda Users
73
+ ```bash
74
+ conda install ffmpeg
75
+ ```
76
+
77
+ ##### Ubuntu/Debian Users
78
+
79
+ ```bash
80
+ sudo apt install ffmpeg
81
+ sudo apt install libsox-dev
82
+ conda install -c conda-forge 'ffmpeg<7'
83
+ ```
84
+
85
+ ##### MacOS Users
86
+
87
+ ```bash
88
+ brew install ffmpeg
89
+ ```
90
+
91
+ ##### Windows Users
92
+
93
+ Download and place [ffmpeg.exe](https://huggingface.co/lj1995/VoiceConversionWebUI/blob/main/ffmpeg.exe) and [ffprobe.exe](https://huggingface.co/lj1995/VoiceConversionWebUI/blob/main/ffprobe.exe) in the GPT-SoVITS root.
94
+
95
+ ### Pretrained Models
96
+
97
+
98
+ Download pretrained models from [GPT-SoVITS Models](https://huggingface.co/lj1995/GPT-SoVITS) and place them in `GPT_SoVITS/pretrained_models`.
99
+
100
+ For Chinese ASR (additionally), download models from [Damo ASR Model](https://modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/files), [Damo VAD Model](https://modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/files), and [Damo Punc Model](https://modelscope.cn/models/damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch/files) and place them in `tools/damo_asr/models`.
101
+
102
+ For UVR5 (Vocals/Accompaniment Separation & Reverberation Removal, additionally), download models from [UVR5 Weights](https://huggingface.co/lj1995/VoiceConversionWebUI/tree/main/uvr5_weights) and place them in `tools/uvr5/uvr5_weights`.
103
+
104
+
105
+ ### Using Docker
106
+
107
+ #### docker-compose.yaml configuration
108
+
109
+ 1. Environment Variables:
110
+ - is_half: Controls half-precision/double-precision. This is typically the cause if the content under the directories 4-cnhubert/5-wav32k is not generated correctly during the "SSL extracting" step. Adjust to True or False based on your actual situation.
111
+
112
+ 2. Volumes Configuration,The application's root directory inside the container is set to /workspace. The default docker-compose.yaml lists some practical examples for uploading/downloading content.
113
+ 3. shm_size: The default available memory for Docker Desktop on Windows is too small, which can cause abnormal operations. Adjust according to your own situation.
114
+ 4. Under the deploy section, GPU-related settings should be adjusted cautiously according to your system and actual circumstances.
115
+
116
+
117
+ #### Running with docker compose
118
+ ```
119
+ docker compose -f "docker-compose.yaml" up -d
120
+ ```
121
+
122
+ #### Running with docker command
123
+
124
+ As above, modify the corresponding parameters based on your actual situation, then run the following command:
125
+ ```
126
+ docker run --rm -it --gpus=all --env=is_half=False --volume=G:\GPT-SoVITS-DockerTest\output:/workspace/output --volume=G:\GPT-SoVITS-DockerTest\logs:/workspace/logs --volume=G:\GPT-SoVITS-DockerTest\SoVITS_weights:/workspace/SoVITS_weights --workdir=/workspace -p 9870:9870 -p 9871:9871 -p 9872:9872 -p 9873:9873 -p 9874:9874 --shm-size="16G" -d breakstring/gpt-sovits:dev-20240123.03
127
+ ```
128
+
129
+
130
+ ## Dataset Format
131
+
132
+ The TTS annotation .list file format:
133
+
134
+ ```
135
+ vocal_path|speaker_name|language|text
136
+ ```
137
+
138
+ Language dictionary:
139
+
140
+ - 'zh': Chinese
141
+ - 'ja': Japanese
142
+ - 'en': English
143
+
144
+ Example:
145
+
146
+ ```
147
+ D:\GPT-SoVITS\xxx/xxx.wav|xxx|en|I like playing Genshin.
148
+ ```
149
+ ## Todo List
150
+
151
+ - [ ] **High Priority:**
152
+ - [ ] Localization in Japanese and English.
153
+ - [ ] User guide.
154
+ - [ ] Japanese and English dataset fine tune training.
155
+
156
+ - [ ] **Features:**
157
+ - [ ] Zero-shot voice conversion (5s) / few-shot voice conversion (1min).
158
+ - [ ] TTS speaking speed control.
159
+ - [ ] Enhanced TTS emotion control.
160
+ - [ ] Experiment with changing SoVITS token inputs to probability distribution of vocabs.
161
+ - [ ] Improve English and Japanese text frontend.
162
+ - [ ] Develop tiny and larger-sized TTS models.
163
+ - [ ] Colab scripts.
164
+ - [ ] Try expand training dataset (2k hours -> 10k hours).
165
+ - [ ] better sovits base model (enhanced audio quality)
166
+ - [ ] model mix
167
+
168
+ ## Credits
169
+
170
+ Special thanks to the following projects and contributors:
171
+
172
+ - [ar-vits](https://github.com/innnky/ar-vits)
173
+ - [SoundStorm](https://github.com/yangdongchao/SoundStorm/tree/master/soundstorm/s1/AR)
174
+ - [vits](https://github.com/jaywalnut310/vits)
175
+ - [TransferTTS](https://github.com/hcy71o/TransferTTS/blob/master/models.py#L556)
176
+ - [Chinese Speech Pretrain](https://github.com/TencentGameMate/chinese_speech_pretrain)
177
+ - [contentvec](https://github.com/auspicious3000/contentvec/)
178
+ - [hifi-gan](https://github.com/jik876/hifi-gan)
179
+ - [Chinese-Roberta-WWM-Ext-Large](https://huggingface.co/hfl/chinese-roberta-wwm-ext-large)
180
+ - [fish-speech](https://github.com/fishaudio/fish-speech/blob/main/tools/llama/generate.py#L41)
181
+ - [ultimatevocalremovergui](https://github.com/Anjok07/ultimatevocalremovergui)
182
+ - [audio-slicer](https://github.com/openvpi/audio-slicer)
183
+ - [SubFix](https://github.com/cronrpc/SubFix)
184
+ - [FFmpeg](https://github.com/FFmpeg/FFmpeg)
185
+ - [gradio](https://github.com/gradio-app/gradio)
186
+
187
+ ## Thanks to all contributors for their efforts
188
+ <a href="https://github.com/RVC-Boss/GPT-SoVITS/graphs/contributors" target="_blank">
189
+ <img src="https://contrib.rocks/image?repo=RVC-Boss/GPT-SoVITS" />
190
+ </a>
README.md CHANGED
@@ -1,190 +1,3 @@
1
- <div align="center">
2
-
3
- <h1>GPT-SoVITS-WebUI</h1>
4
- A Powerful Few-shot Voice Conversion and Text-to-Speech WebUI.<br><br>
5
-
6
- [![madewithlove](https://img.shields.io/badge/made_with-%E2%9D%A4-red?style=for-the-badge&labelColor=orange
7
- )](https://github.com/RVC-Boss/GPT-SoVITS)
8
-
9
- <img src="https://counter.seku.su/cmoe?name=gptsovits&theme=r34" /><br>
10
-
11
- [![Licence](https://img.shields.io/badge/LICENSE-MIT-green.svg?style=for-the-badge)](https://github.com/RVC-Boss/GPT-SoVITS/blob/main/LICENSE)
12
- [![Huggingface](https://img.shields.io/badge/🤗%20-Spaces-yellow.svg?style=for-the-badge)](https://huggingface.co/lj1995/GPT-SoVITS/tree/main)
13
-
14
-
15
- [**English**](./README.md) | [**中文简体**](./docs/cn/README.md) | [**日本語**](./docs/ja/README.md)
16
-
17
- </div>
18
-
19
- ------
20
-
21
-
22
-
23
- > Check out our [demo video](https://www.bilibili.com/video/BV12g4y1m7Uw) here!
24
-
25
- https://github.com/RVC-Boss/GPT-SoVITS/assets/129054828/05bee1fa-bdd8-4d85-9350-80c060ab47fb
26
-
27
- ## Features:
28
- 1. **Zero-shot TTS:** Input a 5-second vocal sample and experience instant text-to-speech conversion.
29
-
30
- 2. **Few-shot TTS:** Fine-tune the model with just 1 minute of training data for improved voice similarity and realism.
31
-
32
- 3. **Cross-lingual Support:** Inference in languages different from the training dataset, currently supporting English, Japanese, and Chinese.
33
-
34
- 4. **WebUI Tools:** Integrated tools include voice accompaniment separation, automatic training set segmentation, Chinese ASR, and text labeling, assisting beginners in creating training datasets and GPT/SoVITS models.
35
-
36
- ## Environment Preparation
37
-
38
- If you are a Windows user (tested with win>=10) you can install directly via the prezip. Just download the [prezip](https://huggingface.co/lj1995/GPT-SoVITS-windows-package/resolve/main/GPT-SoVITS-beta.7z?download=true), unzip it and double-click go-webui.bat to start GPT-SoVITS-WebUI.
39
-
40
- ### Tested Environments
41
-
42
- - Python 3.9, PyTorch 2.0.1, CUDA 11
43
- - Python 3.10.13, PyTorch 2.1.2, CUDA 12.3
44
-
45
- _Note: numba==0.56.4 require py<3.11_
46
-
47
- ### Quick Install with Conda
48
-
49
- ```bash
50
- conda create -n GPTSoVits python=3.9
51
- conda activate GPTSoVits
52
- bash install.sh
53
- ```
54
- ### Install Manually
55
-
56
- #### Pip Packages
57
-
58
- ```bash
59
- pip install torch numpy scipy tensorboard librosa==0.9.2 numba==0.56.4 pytorch-lightning gradio==3.14.0 ffmpeg-python onnxruntime tqdm cn2an pypinyin pyopenjtalk g2p_en chardet transformers jieba_fast
60
- ```
61
-
62
- #### Additional Requirements
63
-
64
- If you need Chinese ASR (supported by FunASR), install:
65
-
66
- ```bash
67
- pip install modelscope torchaudio sentencepiece funasr
68
- ```
69
-
70
- #### FFmpeg
71
-
72
- ##### Conda Users
73
- ```bash
74
- conda install ffmpeg
75
- ```
76
-
77
- ##### Ubuntu/Debian Users
78
-
79
- ```bash
80
- sudo apt install ffmpeg
81
- sudo apt install libsox-dev
82
- conda install -c conda-forge 'ffmpeg<7'
83
- ```
84
-
85
- ##### MacOS Users
86
-
87
- ```bash
88
- brew install ffmpeg
89
- ```
90
-
91
- ##### Windows Users
92
-
93
- Download and place [ffmpeg.exe](https://huggingface.co/lj1995/VoiceConversionWebUI/blob/main/ffmpeg.exe) and [ffprobe.exe](https://huggingface.co/lj1995/VoiceConversionWebUI/blob/main/ffprobe.exe) in the GPT-SoVITS root.
94
-
95
- ### Pretrained Models
96
-
97
-
98
- Download pretrained models from [GPT-SoVITS Models](https://huggingface.co/lj1995/GPT-SoVITS) and place them in `GPT_SoVITS/pretrained_models`.
99
-
100
- For Chinese ASR (additionally), download models from [Damo ASR Model](https://modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/files), [Damo VAD Model](https://modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/files), and [Damo Punc Model](https://modelscope.cn/models/damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch/files) and place them in `tools/damo_asr/models`.
101
-
102
- For UVR5 (Vocals/Accompaniment Separation & Reverberation Removal, additionally), download models from [UVR5 Weights](https://huggingface.co/lj1995/VoiceConversionWebUI/tree/main/uvr5_weights) and place them in `tools/uvr5/uvr5_weights`.
103
-
104
-
105
- ### Using Docker
106
-
107
- #### docker-compose.yaml configuration
108
-
109
- 1. Environment Variables:
110
- - is_half: Controls half-precision/double-precision. This is typically the cause if the content under the directories 4-cnhubert/5-wav32k is not generated correctly during the "SSL extracting" step. Adjust to True or False based on your actual situation.
111
-
112
- 2. Volumes Configuration,The application's root directory inside the container is set to /workspace. The default docker-compose.yaml lists some practical examples for uploading/downloading content.
113
- 3. shm_size: The default available memory for Docker Desktop on Windows is too small, which can cause abnormal operations. Adjust according to your own situation.
114
- 4. Under the deploy section, GPU-related settings should be adjusted cautiously according to your system and actual circumstances.
115
-
116
-
117
- #### Running with docker compose
118
- ```
119
- docker compose -f "docker-compose.yaml" up -d
120
- ```
121
-
122
- #### Running with docker command
123
-
124
- As above, modify the corresponding parameters based on your actual situation, then run the following command:
125
- ```
126
- docker run --rm -it --gpus=all --env=is_half=False --volume=G:\GPT-SoVITS-DockerTest\output:/workspace/output --volume=G:\GPT-SoVITS-DockerTest\logs:/workspace/logs --volume=G:\GPT-SoVITS-DockerTest\SoVITS_weights:/workspace/SoVITS_weights --workdir=/workspace -p 9870:9870 -p 9871:9871 -p 9872:9872 -p 9873:9873 -p 9874:9874 --shm-size="16G" -d breakstring/gpt-sovits:dev-20240123.03
127
- ```
128
-
129
-
130
- ## Dataset Format
131
-
132
- The TTS annotation .list file format:
133
-
134
- ```
135
- vocal_path|speaker_name|language|text
136
- ```
137
-
138
- Language dictionary:
139
-
140
- - 'zh': Chinese
141
- - 'ja': Japanese
142
- - 'en': English
143
-
144
- Example:
145
-
146
- ```
147
- D:\GPT-SoVITS\xxx/xxx.wav|xxx|en|I like playing Genshin.
148
- ```
149
- ## Todo List
150
-
151
- - [ ] **High Priority:**
152
- - [ ] Localization in Japanese and English.
153
- - [ ] User guide.
154
- - [ ] Japanese and English dataset fine tune training.
155
-
156
- - [ ] **Features:**
157
- - [ ] Zero-shot voice conversion (5s) / few-shot voice conversion (1min).
158
- - [ ] TTS speaking speed control.
159
- - [ ] Enhanced TTS emotion control.
160
- - [ ] Experiment with changing SoVITS token inputs to probability distribution of vocabs.
161
- - [ ] Improve English and Japanese text frontend.
162
- - [ ] Develop tiny and larger-sized TTS models.
163
- - [ ] Colab scripts.
164
- - [ ] Try expand training dataset (2k hours -> 10k hours).
165
- - [ ] better sovits base model (enhanced audio quality)
166
- - [ ] model mix
167
-
168
- ## Credits
169
-
170
- Special thanks to the following projects and contributors:
171
-
172
- - [ar-vits](https://github.com/innnky/ar-vits)
173
- - [SoundStorm](https://github.com/yangdongchao/SoundStorm/tree/master/soundstorm/s1/AR)
174
- - [vits](https://github.com/jaywalnut310/vits)
175
- - [TransferTTS](https://github.com/hcy71o/TransferTTS/blob/master/models.py#L556)
176
- - [Chinese Speech Pretrain](https://github.com/TencentGameMate/chinese_speech_pretrain)
177
- - [contentvec](https://github.com/auspicious3000/contentvec/)
178
- - [hifi-gan](https://github.com/jik876/hifi-gan)
179
- - [Chinese-Roberta-WWM-Ext-Large](https://huggingface.co/hfl/chinese-roberta-wwm-ext-large)
180
- - [fish-speech](https://github.com/fishaudio/fish-speech/blob/main/tools/llama/generate.py#L41)
181
- - [ultimatevocalremovergui](https://github.com/Anjok07/ultimatevocalremovergui)
182
- - [audio-slicer](https://github.com/openvpi/audio-slicer)
183
- - [SubFix](https://github.com/cronrpc/SubFix)
184
- - [FFmpeg](https://github.com/FFmpeg/FFmpeg)
185
- - [gradio](https://github.com/gradio-app/gradio)
186
-
187
- ## Thanks to all contributors for their efforts
188
- <a href="https://github.com/RVC-Boss/GPT-SoVITS/graphs/contributors" target="_blank">
189
- <img src="https://contrib.rocks/image?repo=RVC-Boss/GPT-SoVITS" />
190
- </a>
 
1
+ preload_from_hub:
2
+ - test
3
+ - mortimerme/sovits