AlexBidlovtam commited on
Commit
7b78ecc
·
1 Parent(s): 14add55

Upload 12 files

Browse files
Files changed (12) hide show
  1. README.md +50 -5
  2. app.py +2089 -0
  3. config.py +204 -0
  4. gitattributes.txt +35 -0
  5. gitignore.txt +12 -0
  6. i18n.py +28 -0
  7. packages.txt +3 -0
  8. requirements.txt +22 -0
  9. rmvpe.py +432 -0
  10. run.sh +16 -0
  11. utils.py +151 -0
  12. vc_infer_pipeline.py +646 -0
README.md CHANGED
@@ -1,12 +1,57 @@
1
  ---
2
- title: RVC
3
- emoji: 🏆
4
  colorFrom: blue
5
- colorTo: green
6
  sdk: gradio
7
- sdk_version: 4.0.2
8
  app_file: app.py
9
  pinned: false
 
10
  ---
11
 
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: RVC V2
3
+ emoji: 💻
4
  colorFrom: blue
5
+ colorTo: purple
6
  sdk: gradio
7
+ sdk_version: 3.42.0
8
  app_file: app.py
9
  pinned: false
10
+ license: lgpl-3.0
11
  ---
12
 
13
+ ## 🔧 Pre-requisites
14
+
15
+ Before running the project, you must have the following tool installed on your machine:
16
+ * [Python v3.8.0](https://www.python.org/downloads/release/python-380/)
17
+
18
+ Also, you will need to clone the repository:
19
+
20
+ ```bash
21
+ # Clone the repository
22
+ git clone https://huggingface.co/spaces/mateuseap/magic-vocals/
23
+ # Enter in the root directory
24
+ cd magic-vocals
25
+ ```
26
+
27
+ ## 🚀 How to run
28
+
29
+ After you've cloned the repository and entered in the root directory, run the following commands:
30
+
31
+ ```bash
32
+ # Create and activate a Virtual Environment (make sure you're using Python v3.8.0 to do it)
33
+ python -m venv venv
34
+ . venv/bin/activate
35
+
36
+ # Change mode and execute a shell script to configure and run the application
37
+ chmod +x run.sh
38
+ ./run.sh
39
+ ```
40
+
41
+ After the shell script executes everything, the application will be running at http://127.0.0.1:7860! Open up the link in a browser to use the app:
42
+
43
+ ![Magic Vocals](https://i.imgur.com/V55oKv8.png)
44
+
45
+ **You only need to execute the `run.sh` one time**, once you've executed it one time, you just need to activate the virtual environment and run the command below to start the app again:
46
+
47
+ ```bash
48
+ python app.py
49
+ ```
50
+
51
+ **THE `run.sh` IS SUPPORTED BY THE FOLLOWING OPERATING SYSTEMS:**
52
+
53
+
54
+ | OS | Supported |
55
+ |-----------|:---------:|
56
+ | `Windows` | ❌ |
57
+ | `Ubuntu` | ✅ |
app.py ADDED
@@ -0,0 +1,2089 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import subprocess, torch, os, traceback, sys, warnings, shutil, numpy as np
2
+ from mega import Mega
3
+ os.environ["no_proxy"] = "localhost, 127.0.0.1, ::1"
4
+ import threading
5
+ from pathlib import Path
6
+ from time import sleep
7
+ from subprocess import Popen
8
+ import faiss
9
+ from random import shuffle
10
+ import json, datetime, requests
11
+ from gtts import gTTS
12
+ now_dir = os.getcwd()
13
+ sys.path.append(now_dir)
14
+ tmp = os.path.join(now_dir, "TEMP")
15
+ shutil.rmtree(tmp, ignore_errors=True)
16
+ shutil.rmtree("%s/runtime/Lib/site-packages/infer_pack" % (now_dir), ignore_errors=True)
17
+ os.makedirs(tmp, exist_ok=True)
18
+ os.makedirs(os.path.join(now_dir, "logs"), exist_ok=True)
19
+ os.makedirs(os.path.join(now_dir, "weights"), exist_ok=True)
20
+ os.environ["TEMP"] = tmp
21
+ warnings.filterwarnings("ignore")
22
+ torch.manual_seed(114514)
23
+ from i18n import I18nAuto
24
+
25
+ import signal
26
+
27
+ import math
28
+
29
+ from utils import load_audio, CSVutil
30
+
31
+ global DoFormant, Quefrency, Timbre
32
+
33
+ if not os.path.isdir('csvdb/'):
34
+ os.makedirs('csvdb')
35
+ frmnt, stp = open("csvdb/formanting.csv", 'w'), open("csvdb/stop.csv", 'w')
36
+ frmnt.close()
37
+ stp.close()
38
+
39
+ try:
40
+ DoFormant, Quefrency, Timbre = CSVutil('csvdb/formanting.csv', 'r', 'formanting')
41
+ DoFormant = (
42
+ lambda DoFormant: True if DoFormant.lower() == 'true' else (False if DoFormant.lower() == 'false' else DoFormant)
43
+ )(DoFormant)
44
+ except (ValueError, TypeError, IndexError):
45
+ DoFormant, Quefrency, Timbre = False, 1.0, 1.0
46
+ CSVutil('csvdb/formanting.csv', 'w+', 'formanting', DoFormant, Quefrency, Timbre)
47
+
48
+ def download_models():
49
+ # Download hubert base model if not present
50
+ if not os.path.isfile('./hubert_base.pt'):
51
+ response = requests.get('https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/hubert_base.pt')
52
+
53
+ if response.status_code == 200:
54
+ with open('./hubert_base.pt', 'wb') as f:
55
+ f.write(response.content)
56
+ print("Downloaded hubert base model file successfully. File saved to ./hubert_base.pt.")
57
+ else:
58
+ raise Exception("Failed to download hubert base model file. Status code: " + str(response.status_code) + ".")
59
+
60
+ # Download rmvpe model if not present
61
+ if not os.path.isfile('./rmvpe.pt'):
62
+ response = requests.get('https://drive.usercontent.google.com/download?id=1Hkn4kNuVFRCNQwyxQFRtmzmMBGpQxptI&export=download&authuser=0&confirm=t&uuid=0b3a40de-465b-4c65-8c41-135b0b45c3f7&at=APZUnTV3lA3LnyTbeuduura6Dmi2:1693724254058')
63
+
64
+ if response.status_code == 200:
65
+ with open('./rmvpe.pt', 'wb') as f:
66
+ f.write(response.content)
67
+ print("Downloaded rmvpe model file successfully. File saved to ./rmvpe.pt.")
68
+ else:
69
+ raise Exception("Failed to download rmvpe model file. Status code: " + str(response.status_code) + ".")
70
+
71
+ download_models()
72
+
73
+ print("\n-------------------------------\nRVC v2 Easy GUI (Local Edition)\n-------------------------------\n")
74
+
75
+ def formant_apply(qfrency, tmbre):
76
+ Quefrency = qfrency
77
+ Timbre = tmbre
78
+ DoFormant = True
79
+ CSVutil('csvdb/formanting.csv', 'w+', 'formanting', DoFormant, qfrency, tmbre)
80
+
81
+ return ({"value": Quefrency, "__type__": "update"}, {"value": Timbre, "__type__": "update"})
82
+
83
+ def get_fshift_presets():
84
+ fshift_presets_list = []
85
+ for dirpath, _, filenames in os.walk("./formantshiftcfg/"):
86
+ for filename in filenames:
87
+ if filename.endswith(".txt"):
88
+ fshift_presets_list.append(os.path.join(dirpath,filename).replace('\\','/'))
89
+
90
+ if len(fshift_presets_list) > 0:
91
+ return fshift_presets_list
92
+ else:
93
+ return ''
94
+
95
+
96
+
97
+ def formant_enabled(cbox, qfrency, tmbre, frmntapply, formantpreset, formant_refresh_button):
98
+
99
+ if (cbox):
100
+
101
+ DoFormant = True
102
+ CSVutil('csvdb/formanting.csv', 'w+', 'formanting', DoFormant, qfrency, tmbre)
103
+ #print(f"is checked? - {cbox}\ngot {DoFormant}")
104
+
105
+ return (
106
+ {"value": True, "__type__": "update"},
107
+ {"visible": True, "__type__": "update"},
108
+ {"visible": True, "__type__": "update"},
109
+ {"visible": True, "__type__": "update"},
110
+ {"visible": True, "__type__": "update"},
111
+ {"visible": True, "__type__": "update"},
112
+ )
113
+
114
+
115
+ else:
116
+
117
+ DoFormant = False
118
+ CSVutil('csvdb/formanting.csv', 'w+', 'formanting', DoFormant, qfrency, tmbre)
119
+
120
+ #print(f"is checked? - {cbox}\ngot {DoFormant}")
121
+ return (
122
+ {"value": False, "__type__": "update"},
123
+ {"visible": False, "__type__": "update"},
124
+ {"visible": False, "__type__": "update"},
125
+ {"visible": False, "__type__": "update"},
126
+ {"visible": False, "__type__": "update"},
127
+ {"visible": False, "__type__": "update"},
128
+ {"visible": False, "__type__": "update"},
129
+ )
130
+
131
+
132
+
133
+ def preset_apply(preset, qfer, tmbr):
134
+ if str(preset) != '':
135
+ with open(str(preset), 'r') as p:
136
+ content = p.readlines()
137
+ qfer, tmbr = content[0].split('\n')[0], content[1]
138
+
139
+ formant_apply(qfer, tmbr)
140
+ else:
141
+ pass
142
+ return ({"value": qfer, "__type__": "update"}, {"value": tmbr, "__type__": "update"})
143
+
144
+ def update_fshift_presets(preset, qfrency, tmbre):
145
+
146
+ qfrency, tmbre = preset_apply(preset, qfrency, tmbre)
147
+
148
+ if (str(preset) != ''):
149
+ with open(str(preset), 'r') as p:
150
+ content = p.readlines()
151
+ qfrency, tmbre = content[0].split('\n')[0], content[1]
152
+
153
+ formant_apply(qfrency, tmbre)
154
+ else:
155
+ pass
156
+ return (
157
+ {"choices": get_fshift_presets(), "__type__": "update"},
158
+ {"value": qfrency, "__type__": "update"},
159
+ {"value": tmbre, "__type__": "update"},
160
+ )
161
+
162
+ i18n = I18nAuto()
163
+ #i18n.print()
164
+ # 判断是否有能用来训练和加速推理的N卡
165
+ ngpu = torch.cuda.device_count()
166
+ gpu_infos = []
167
+ mem = []
168
+ if (not torch.cuda.is_available()) or ngpu == 0:
169
+ if_gpu_ok = False
170
+ else:
171
+ if_gpu_ok = False
172
+ for i in range(ngpu):
173
+ gpu_name = torch.cuda.get_device_name(i)
174
+ if (
175
+ "10" in gpu_name
176
+ or "16" in gpu_name
177
+ or "20" in gpu_name
178
+ or "30" in gpu_name
179
+ or "40" in gpu_name
180
+ or "A2" in gpu_name.upper()
181
+ or "A3" in gpu_name.upper()
182
+ or "A4" in gpu_name.upper()
183
+ or "P4" in gpu_name.upper()
184
+ or "A50" in gpu_name.upper()
185
+ or "A60" in gpu_name.upper()
186
+ or "70" in gpu_name
187
+ or "80" in gpu_name
188
+ or "90" in gpu_name
189
+ or "M4" in gpu_name.upper()
190
+ or "T4" in gpu_name.upper()
191
+ or "TITAN" in gpu_name.upper()
192
+ ): # A10#A100#V100#A40#P40#M40#K80#A4500
193
+ if_gpu_ok = True # 至少有一张能用的N卡
194
+ gpu_infos.append("%s\t%s" % (i, gpu_name))
195
+ mem.append(
196
+ int(
197
+ torch.cuda.get_device_properties(i).total_memory
198
+ / 1024
199
+ / 1024
200
+ / 1024
201
+ + 0.4
202
+ )
203
+ )
204
+ if if_gpu_ok == True and len(gpu_infos) > 0:
205
+ gpu_info = "\n".join(gpu_infos)
206
+ default_batch_size = min(mem) // 2
207
+ else:
208
+ gpu_info = i18n("很遗憾您这没有能用的显卡来支持您训练")
209
+ default_batch_size = 1
210
+ gpus = "-".join([i[0] for i in gpu_infos])
211
+ from lib.infer_pack.models import (
212
+ SynthesizerTrnMs256NSFsid,
213
+ SynthesizerTrnMs256NSFsid_nono,
214
+ SynthesizerTrnMs768NSFsid,
215
+ SynthesizerTrnMs768NSFsid_nono,
216
+ )
217
+ import soundfile as sf
218
+ from fairseq import checkpoint_utils
219
+ import gradio as gr
220
+ import logging
221
+ from vc_infer_pipeline import VC
222
+ from config import Config
223
+
224
+ config = Config()
225
+ # from trainset_preprocess_pipeline import PreProcess
226
+ logging.getLogger("numba").setLevel(logging.WARNING)
227
+
228
+ hubert_model = None
229
+
230
+ def load_hubert():
231
+ global hubert_model
232
+ models, _, _ = checkpoint_utils.load_model_ensemble_and_task(
233
+ ["hubert_base.pt"],
234
+ suffix="",
235
+ )
236
+ hubert_model = models[0]
237
+ hubert_model = hubert_model.to(config.device)
238
+ if config.is_half:
239
+ hubert_model = hubert_model.half()
240
+ else:
241
+ hubert_model = hubert_model.float()
242
+ hubert_model.eval()
243
+
244
+
245
+ weight_root = "weights"
246
+ index_root = "logs"
247
+ names = []
248
+ for name in os.listdir(weight_root):
249
+ if name.endswith(".pth"):
250
+ names.append(name)
251
+ index_paths = []
252
+ for root, dirs, files in os.walk(index_root, topdown=False):
253
+ for name in files:
254
+ if name.endswith(".index") and "trained" not in name:
255
+ index_paths.append("%s/%s" % (root, name))
256
+
257
+
258
+
259
+ def vc_single(
260
+ sid,
261
+ input_audio_path,
262
+ f0_up_key,
263
+ f0_file,
264
+ f0_method,
265
+ file_index,
266
+ #file_index2,
267
+ # file_big_npy,
268
+ index_rate,
269
+ filter_radius,
270
+ resample_sr,
271
+ rms_mix_rate,
272
+ protect,
273
+ crepe_hop_length,
274
+ ): # spk_item, input_audio0, vc_transform0,f0_file,f0method0
275
+ global tgt_sr, net_g, vc, hubert_model, version
276
+ if input_audio_path is None:
277
+ return "You need to upload an audio", None
278
+ f0_up_key = int(f0_up_key)
279
+ try:
280
+ audio = load_audio(input_audio_path, 16000, DoFormant, Quefrency, Timbre)
281
+ audio_max = np.abs(audio).max() / 0.95
282
+ if audio_max > 1:
283
+ audio /= audio_max
284
+ times = [0, 0, 0]
285
+ if hubert_model == None:
286
+ load_hubert()
287
+ if_f0 = cpt.get("f0", 1)
288
+ file_index = (
289
+ (
290
+ file_index.strip(" ")
291
+ .strip('"')
292
+ .strip("\n")
293
+ .strip('"')
294
+ .strip(" ")
295
+ .replace("trained", "added")
296
+ )
297
+ ) # 防止小白写错,自动帮他替换掉
298
+ # file_big_npy = (
299
+ # file_big_npy.strip(" ").strip('"').strip("\n").strip('"').strip(" ")
300
+ # )
301
+ audio_opt = vc.pipeline(
302
+ hubert_model,
303
+ net_g,
304
+ sid,
305
+ audio,
306
+ input_audio_path,
307
+ times,
308
+ f0_up_key,
309
+ f0_method,
310
+ file_index,
311
+ # file_big_npy,
312
+ index_rate,
313
+ if_f0,
314
+ filter_radius,
315
+ tgt_sr,
316
+ resample_sr,
317
+ rms_mix_rate,
318
+ version,
319
+ protect,
320
+ crepe_hop_length,
321
+ f0_file=f0_file,
322
+ )
323
+ if resample_sr >= 16000 and tgt_sr != resample_sr:
324
+ tgt_sr = resample_sr
325
+ index_info = (
326
+ "Using index:%s." % file_index
327
+ if os.path.exists(file_index)
328
+ else "Index not used."
329
+ )
330
+ return "Success.\n %s\nTime:\n npy:%ss, f0:%ss, infer:%ss" % (
331
+ index_info,
332
+ times[0],
333
+ times[1],
334
+ times[2],
335
+ ), (tgt_sr, audio_opt)
336
+ except:
337
+ info = traceback.format_exc()
338
+ print(info)
339
+ return info, (None, None)
340
+
341
+
342
+ def vc_multi(
343
+ sid,
344
+ dir_path,
345
+ opt_root,
346
+ paths,
347
+ f0_up_key,
348
+ f0_method,
349
+ file_index,
350
+ file_index2,
351
+ # file_big_npy,
352
+ index_rate,
353
+ filter_radius,
354
+ resample_sr,
355
+ rms_mix_rate,
356
+ protect,
357
+ format1,
358
+ crepe_hop_length,
359
+ ):
360
+ try:
361
+ dir_path = (
362
+ dir_path.strip(" ").strip('"').strip("\n").strip('"').strip(" ")
363
+ ) # 防止小白拷路径头尾带了空格和"和回车
364
+ opt_root = opt_root.strip(" ").strip('"').strip("\n").strip('"').strip(" ")
365
+ os.makedirs(opt_root, exist_ok=True)
366
+ try:
367
+ if dir_path != "":
368
+ paths = [os.path.join(dir_path, name) for name in os.listdir(dir_path)]
369
+ else:
370
+ paths = [path.name for path in paths]
371
+ except:
372
+ traceback.print_exc()
373
+ paths = [path.name for path in paths]
374
+ infos = []
375
+ for path in paths:
376
+ info, opt = vc_single(
377
+ sid,
378
+ path,
379
+ f0_up_key,
380
+ None,
381
+ f0_method,
382
+ file_index,
383
+ # file_big_npy,
384
+ index_rate,
385
+ filter_radius,
386
+ resample_sr,
387
+ rms_mix_rate,
388
+ protect,
389
+ crepe_hop_length
390
+ )
391
+ if "Success" in info:
392
+ try:
393
+ tgt_sr, audio_opt = opt
394
+ if format1 in ["wav", "flac"]:
395
+ sf.write(
396
+ "%s/%s.%s" % (opt_root, os.path.basename(path), format1),
397
+ audio_opt,
398
+ tgt_sr,
399
+ )
400
+ else:
401
+ path = "%s/%s.wav" % (opt_root, os.path.basename(path))
402
+ sf.write(
403
+ path,
404
+ audio_opt,
405
+ tgt_sr,
406
+ )
407
+ if os.path.exists(path):
408
+ os.system(
409
+ "ffmpeg -i %s -vn %s -q:a 2 -y"
410
+ % (path, path[:-4] + ".%s" % format1)
411
+ )
412
+ except:
413
+ info += traceback.format_exc()
414
+ infos.append("%s->%s" % (os.path.basename(path), info))
415
+ yield "\n".join(infos)
416
+ yield "\n".join(infos)
417
+ except:
418
+ yield traceback.format_exc()
419
+
420
+ # 一个选项卡全局只能有一个音色
421
+ def get_vc(sid):
422
+ global n_spk, tgt_sr, net_g, vc, cpt, version
423
+ if sid == "" or sid == []:
424
+ global hubert_model
425
+ if hubert_model != None: # 考虑到轮询, 需要加个判断看是否 sid 是由有模型切换到无模型的
426
+ print("clean_empty_cache")
427
+ del net_g, n_spk, vc, hubert_model, tgt_sr # ,cpt
428
+ hubert_model = net_g = n_spk = vc = hubert_model = tgt_sr = None
429
+ if torch.cuda.is_available():
430
+ torch.cuda.empty_cache()
431
+ ###楼下不这么折腾清理不干净
432
+ if_f0 = cpt.get("f0", 1)
433
+ version = cpt.get("version", "v1")
434
+ if version == "v1":
435
+ if if_f0 == 1:
436
+ net_g = SynthesizerTrnMs256NSFsid(
437
+ *cpt["config"], is_half=config.is_half
438
+ )
439
+ else:
440
+ net_g = SynthesizerTrnMs256NSFsid_nono(*cpt["config"])
441
+ elif version == "v2":
442
+ if if_f0 == 1:
443
+ net_g = SynthesizerTrnMs768NSFsid(
444
+ *cpt["config"], is_half=config.is_half
445
+ )
446
+ else:
447
+ net_g = SynthesizerTrnMs768NSFsid_nono(*cpt["config"])
448
+ del net_g, cpt
449
+ if torch.cuda.is_available():
450
+ torch.cuda.empty_cache()
451
+ cpt = None
452
+ return {"visible": False, "__type__": "update"}
453
+ person = "%s/%s" % (weight_root, sid)
454
+ print("loading %s" % person)
455
+ cpt = torch.load(person, map_location="cpu")
456
+ tgt_sr = cpt["config"][-1]
457
+ cpt["config"][-3] = cpt["weight"]["emb_g.weight"].shape[0] # n_spk
458
+ if_f0 = cpt.get("f0", 1)
459
+ version = cpt.get("version", "v1")
460
+ if version == "v1":
461
+ if if_f0 == 1:
462
+ net_g = SynthesizerTrnMs256NSFsid(*cpt["config"], is_half=config.is_half)
463
+ else:
464
+ net_g = SynthesizerTrnMs256NSFsid_nono(*cpt["config"])
465
+ elif version == "v2":
466
+ if if_f0 == 1:
467
+ net_g = SynthesizerTrnMs768NSFsid(*cpt["config"], is_half=config.is_half)
468
+ else:
469
+ net_g = SynthesizerTrnMs768NSFsid_nono(*cpt["config"])
470
+ del net_g.enc_q
471
+ print(net_g.load_state_dict(cpt["weight"], strict=False))
472
+ net_g.eval().to(config.device)
473
+ if config.is_half:
474
+ net_g = net_g.half()
475
+ else:
476
+ net_g = net_g.float()
477
+ vc = VC(tgt_sr, config)
478
+ n_spk = cpt["config"][-3]
479
+ return {"visible": False, "maximum": n_spk, "__type__": "update"}
480
+
481
+
482
+ def change_choices():
483
+ names = []
484
+ for name in os.listdir(weight_root):
485
+ if name.endswith(".pth"):
486
+ names.append(name)
487
+ index_paths = []
488
+ for root, dirs, files in os.walk(index_root, topdown=False):
489
+ for name in files:
490
+ if name.endswith(".index") and "trained" not in name:
491
+ index_paths.append("%s/%s" % (root, name))
492
+ return {"choices": sorted(names), "__type__": "update"}, {
493
+ "choices": sorted(index_paths),
494
+ "__type__": "update",
495
+ }
496
+
497
+
498
+ def clean():
499
+ return {"value": "", "__type__": "update"}
500
+
501
+
502
+ sr_dict = {
503
+ "32k": 32000,
504
+ "40k": 40000,
505
+ "48k": 48000,
506
+ }
507
+
508
+
509
+ def if_done(done, p):
510
+ while 1:
511
+ if p.poll() == None:
512
+ sleep(0.5)
513
+ else:
514
+ break
515
+ done[0] = True
516
+
517
+
518
+ def if_done_multi(done, ps):
519
+ while 1:
520
+ # poll==None代表进程未结束
521
+ # 只要有一个进程未结束都不停
522
+ flag = 1
523
+ for p in ps:
524
+ if p.poll() == None:
525
+ flag = 0
526
+ sleep(0.5)
527
+ break
528
+ if flag == 1:
529
+ break
530
+ done[0] = True
531
+
532
+
533
+ def preprocess_dataset(trainset_dir, exp_dir, sr, n_p):
534
+ sr = sr_dict[sr]
535
+ os.makedirs("%s/logs/%s" % (now_dir, exp_dir), exist_ok=True)
536
+ f = open("%s/logs/%s/preprocess.log" % (now_dir, exp_dir), "w")
537
+ f.close()
538
+ cmd = (
539
+ config.python_cmd
540
+ + " trainset_preprocess_pipeline_print.py %s %s %s %s/logs/%s "
541
+ % (trainset_dir, sr, n_p, now_dir, exp_dir)
542
+ + str(config.noparallel)
543
+ )
544
+ print(cmd)
545
+ p = Popen(cmd, shell=True) # , stdin=PIPE, stdout=PIPE,stderr=PIPE,cwd=now_dir
546
+ ###煞笔gr, popen read都非得全跑完了再一次性读取, 不用gr就正常读一句输出一句;只能额外弄出一个文本流定时读
547
+ done = [False]
548
+ threading.Thread(
549
+ target=if_done,
550
+ args=(
551
+ done,
552
+ p,
553
+ ),
554
+ ).start()
555
+ while 1:
556
+ with open("%s/logs/%s/preprocess.log" % (now_dir, exp_dir), "r") as f:
557
+ yield (f.read())
558
+ sleep(1)
559
+ if done[0] == True:
560
+ break
561
+ with open("%s/logs/%s/preprocess.log" % (now_dir, exp_dir), "r") as f:
562
+ log = f.read()
563
+ print(log)
564
+ yield log
565
+
566
+ # but2.click(extract_f0,[gpus6,np7,f0method8,if_f0_3,trainset_dir4],[info2])
567
+ def extract_f0_feature(gpus, n_p, f0method, if_f0, exp_dir, version19, echl):
568
+ gpus = gpus.split("-")
569
+ os.makedirs("%s/logs/%s" % (now_dir, exp_dir), exist_ok=True)
570
+ f = open("%s/logs/%s/extract_f0_feature.log" % (now_dir, exp_dir), "w")
571
+ f.close()
572
+ if if_f0:
573
+ cmd = config.python_cmd + " extract_f0_print.py %s/logs/%s %s %s %s" % (
574
+ now_dir,
575
+ exp_dir,
576
+ n_p,
577
+ f0method,
578
+ echl,
579
+ )
580
+ print(cmd)
581
+ p = Popen(cmd, shell=True, cwd=now_dir) # , stdin=PIPE, stdout=PIPE,stderr=PIPE
582
+ ###煞笔gr, popen read都非得全跑完了再一次性读取, 不用gr就正常读一句输出一句;只能额外弄出一个文本流定时读
583
+ done = [False]
584
+ threading.Thread(
585
+ target=if_done,
586
+ args=(
587
+ done,
588
+ p,
589
+ ),
590
+ ).start()
591
+ while 1:
592
+ with open(
593
+ "%s/logs/%s/extract_f0_feature.log" % (now_dir, exp_dir), "r"
594
+ ) as f:
595
+ yield (f.read())
596
+ sleep(1)
597
+ if done[0] == True:
598
+ break
599
+ with open("%s/logs/%s/extract_f0_feature.log" % (now_dir, exp_dir), "r") as f:
600
+ log = f.read()
601
+ print(log)
602
+ yield log
603
+ ####对不同part分别开多进程
604
+ """
605
+ n_part=int(sys.argv[1])
606
+ i_part=int(sys.argv[2])
607
+ i_gpu=sys.argv[3]
608
+ exp_dir=sys.argv[4]
609
+ os.environ["CUDA_VISIBLE_DEVICES"]=str(i_gpu)
610
+ """
611
+ leng = len(gpus)
612
+ ps = []
613
+ for idx, n_g in enumerate(gpus):
614
+ cmd = (
615
+ config.python_cmd
616
+ + " extract_feature_print.py %s %s %s %s %s/logs/%s %s"
617
+ % (
618
+ config.device,
619
+ leng,
620
+ idx,
621
+ n_g,
622
+ now_dir,
623
+ exp_dir,
624
+ version19,
625
+ )
626
+ )
627
+ print(cmd)
628
+ p = Popen(
629
+ cmd, shell=True, cwd=now_dir
630
+ ) # , shell=True, stdin=PIPE, stdout=PIPE, stderr=PIPE, cwd=now_dir
631
+ ps.append(p)
632
+ ###煞笔gr, popen read都非得全跑完了再一次性读取, 不用gr就正常读一句输出一句;只能额外弄出一个文本流定时读
633
+ done = [False]
634
+ threading.Thread(
635
+ target=if_done_multi,
636
+ args=(
637
+ done,
638
+ ps,
639
+ ),
640
+ ).start()
641
+ while 1:
642
+ with open("%s/logs/%s/extract_f0_feature.log" % (now_dir, exp_dir), "r") as f:
643
+ yield (f.read())
644
+ sleep(1)
645
+ if done[0] == True:
646
+ break
647
+ with open("%s/logs/%s/extract_f0_feature.log" % (now_dir, exp_dir), "r") as f:
648
+ log = f.read()
649
+ print(log)
650
+ yield log
651
+
652
+
653
+ def change_sr2(sr2, if_f0_3, version19):
654
+ path_str = "" if version19 == "v1" else "_v2"
655
+ f0_str = "f0" if if_f0_3 else ""
656
+ if_pretrained_generator_exist = os.access("pretrained%s/%sG%s.pth" % (path_str, f0_str, sr2), os.F_OK)
657
+ if_pretrained_discriminator_exist = os.access("pretrained%s/%sD%s.pth" % (path_str, f0_str, sr2), os.F_OK)
658
+ if (if_pretrained_generator_exist == False):
659
+ print("pretrained%s/%sG%s.pth" % (path_str, f0_str, sr2), "not exist, will not use pretrained model")
660
+ if (if_pretrained_discriminator_exist == False):
661
+ print("pretrained%s/%sD%s.pth" % (path_str, f0_str, sr2), "not exist, will not use pretrained model")
662
+ return (
663
+ ("pretrained%s/%sG%s.pth" % (path_str, f0_str, sr2)) if if_pretrained_generator_exist else "",
664
+ ("pretrained%s/%sD%s.pth" % (path_str, f0_str, sr2)) if if_pretrained_discriminator_exist else "",
665
+ {"visible": True, "__type__": "update"}
666
+ )
667
+
668
+ def change_version19(sr2, if_f0_3, version19):
669
+ path_str = "" if version19 == "v1" else "_v2"
670
+ f0_str = "f0" if if_f0_3 else ""
671
+ if_pretrained_generator_exist = os.access("pretrained%s/%sG%s.pth" % (path_str, f0_str, sr2), os.F_OK)
672
+ if_pretrained_discriminator_exist = os.access("pretrained%s/%sD%s.pth" % (path_str, f0_str, sr2), os.F_OK)
673
+ if (if_pretrained_generator_exist == False):
674
+ print("pretrained%s/%sG%s.pth" % (path_str, f0_str, sr2), "not exist, will not use pretrained model")
675
+ if (if_pretrained_discriminator_exist == False):
676
+ print("pretrained%s/%sD%s.pth" % (path_str, f0_str, sr2), "not exist, will not use pretrained model")
677
+ return (
678
+ ("pretrained%s/%sG%s.pth" % (path_str, f0_str, sr2)) if if_pretrained_generator_exist else "",
679
+ ("pretrained%s/%sD%s.pth" % (path_str, f0_str, sr2)) if if_pretrained_discriminator_exist else "",
680
+ )
681
+
682
+
683
+ def change_f0(if_f0_3, sr2, version19): # f0method8,pretrained_G14,pretrained_D15
684
+ path_str = "" if version19 == "v1" else "_v2"
685
+ if_pretrained_generator_exist = os.access("pretrained%s/f0G%s.pth" % (path_str, sr2), os.F_OK)
686
+ if_pretrained_discriminator_exist = os.access("pretrained%s/f0D%s.pth" % (path_str, sr2), os.F_OK)
687
+ if (if_pretrained_generator_exist == False):
688
+ print("pretrained%s/f0G%s.pth" % (path_str, sr2), "not exist, will not use pretrained model")
689
+ if (if_pretrained_discriminator_exist == False):
690
+ print("pretrained%s/f0D%s.pth" % (path_str, sr2), "not exist, will not use pretrained model")
691
+ if if_f0_3:
692
+ return (
693
+ {"visible": True, "__type__": "update"},
694
+ "pretrained%s/f0G%s.pth" % (path_str, sr2) if if_pretrained_generator_exist else "",
695
+ "pretrained%s/f0D%s.pth" % (path_str, sr2) if if_pretrained_discriminator_exist else "",
696
+ )
697
+ return (
698
+ {"visible": False, "__type__": "update"},
699
+ ("pretrained%s/G%s.pth" % (path_str, sr2)) if if_pretrained_generator_exist else "",
700
+ ("pretrained%s/D%s.pth" % (path_str, sr2)) if if_pretrained_discriminator_exist else "",
701
+ )
702
+
703
+
704
+ global log_interval
705
+
706
+
707
+ def set_log_interval(exp_dir, batch_size12):
708
+ log_interval = 1
709
+
710
+ folder_path = os.path.join(exp_dir, "1_16k_wavs")
711
+
712
+ if os.path.exists(folder_path) and os.path.isdir(folder_path):
713
+ wav_files = [f for f in os.listdir(folder_path) if f.endswith(".wav")]
714
+ if wav_files:
715
+ sample_size = len(wav_files)
716
+ log_interval = math.ceil(sample_size / batch_size12)
717
+ if log_interval > 1:
718
+ log_interval += 1
719
+ return log_interval
720
+
721
+ # but3.click(click_train,[exp_dir1,sr2,if_f0_3,save_epoch10,total_epoch11,batch_size12,if_save_latest13,pretrained_G14,pretrained_D15,gpus16])
722
+ def click_train(
723
+ exp_dir1,
724
+ sr2,
725
+ if_f0_3,
726
+ spk_id5,
727
+ save_epoch10,
728
+ total_epoch11,
729
+ batch_size12,
730
+ if_save_latest13,
731
+ pretrained_G14,
732
+ pretrained_D15,
733
+ gpus16,
734
+ if_cache_gpu17,
735
+ if_save_every_weights18,
736
+ version19,
737
+ ):
738
+ CSVutil('csvdb/stop.csv', 'w+', 'formanting', False)
739
+ # 生成filelist
740
+ exp_dir = "%s/logs/%s" % (now_dir, exp_dir1)
741
+ os.makedirs(exp_dir, exist_ok=True)
742
+ gt_wavs_dir = "%s/0_gt_wavs" % (exp_dir)
743
+ feature_dir = (
744
+ "%s/3_feature256" % (exp_dir)
745
+ if version19 == "v1"
746
+ else "%s/3_feature768" % (exp_dir)
747
+ )
748
+
749
+ log_interval = set_log_interval(exp_dir, batch_size12)
750
+
751
+ if if_f0_3:
752
+ f0_dir = "%s/2a_f0" % (exp_dir)
753
+ f0nsf_dir = "%s/2b-f0nsf" % (exp_dir)
754
+ names = (
755
+ set([name.split(".")[0] for name in os.listdir(gt_wavs_dir)])
756
+ & set([name.split(".")[0] for name in os.listdir(feature_dir)])
757
+ & set([name.split(".")[0] for name in os.listdir(f0_dir)])
758
+ & set([name.split(".")[0] for name in os.listdir(f0nsf_dir)])
759
+ )
760
+ else:
761
+ names = set([name.split(".")[0] for name in os.listdir(gt_wavs_dir)]) & set(
762
+ [name.split(".")[0] for name in os.listdir(feature_dir)]
763
+ )
764
+ opt = []
765
+ for name in names:
766
+ if if_f0_3:
767
+ opt.append(
768
+ "%s/%s.wav|%s/%s.npy|%s/%s.wav.npy|%s/%s.wav.npy|%s"
769
+ % (
770
+ gt_wavs_dir.replace("\\", "\\\\"),
771
+ name,
772
+ feature_dir.replace("\\", "\\\\"),
773
+ name,
774
+ f0_dir.replace("\\", "\\\\"),
775
+ name,
776
+ f0nsf_dir.replace("\\", "\\\\"),
777
+ name,
778
+ spk_id5,
779
+ )
780
+ )
781
+ else:
782
+ opt.append(
783
+ "%s/%s.wav|%s/%s.npy|%s"
784
+ % (
785
+ gt_wavs_dir.replace("\\", "\\\\"),
786
+ name,
787
+ feature_dir.replace("\\", "\\\\"),
788
+ name,
789
+ spk_id5,
790
+ )
791
+ )
792
+ fea_dim = 256 if version19 == "v1" else 768
793
+ if if_f0_3:
794
+ for _ in range(2):
795
+ opt.append(
796
+ "%s/logs/mute/0_gt_wavs/mute%s.wav|%s/logs/mute/3_feature%s/mute.npy|%s/logs/mute/2a_f0/mute.wav.npy|%s/logs/mute/2b-f0nsf/mute.wav.npy|%s"
797
+ % (now_dir, sr2, now_dir, fea_dim, now_dir, now_dir, spk_id5)
798
+ )
799
+ else:
800
+ for _ in range(2):
801
+ opt.append(
802
+ "%s/logs/mute/0_gt_wavs/mute%s.wav|%s/logs/mute/3_feature%s/mute.npy|%s"
803
+ % (now_dir, sr2, now_dir, fea_dim, spk_id5)
804
+ )
805
+ shuffle(opt)
806
+ with open("%s/filelist.txt" % exp_dir, "w") as f:
807
+ f.write("\n".join(opt))
808
+ print("write filelist done")
809
+ # 生成config#无需生成config
810
+ # cmd = python_cmd + " train_nsf_sim_cache_sid_load_pretrain.py -e mi-test -sr 40k -f0 1 -bs 4 -g 0 -te 10 -se 5 -pg pretrained/f0G40k.pth -pd pretrained/f0D40k.pth -l 1 -c 0"
811
+ print("use gpus:", gpus16)
812
+ if pretrained_G14 == "":
813
+ print("no pretrained Generator")
814
+ if pretrained_D15 == "":
815
+ print("no pretrained Discriminator")
816
+ if gpus16:
817
+ cmd = (
818
+ config.python_cmd
819
+ + " train_nsf_sim_cache_sid_load_pretrain.py -e %s -sr %s -f0 %s -bs %s -g %s -te %s -se %s %s %s -l %s -c %s -sw %s -v %s -li %s"
820
+ % (
821
+ exp_dir1,
822
+ sr2,
823
+ 1 if if_f0_3 else 0,
824
+ batch_size12,
825
+ gpus16,
826
+ total_epoch11,
827
+ save_epoch10,
828
+ ("-pg %s" % pretrained_G14) if pretrained_G14 != "" else "",
829
+ ("-pd %s" % pretrained_D15) if pretrained_D15 != "" else "",
830
+ 1 if if_save_latest13 == True else 0,
831
+ 1 if if_cache_gpu17 == True else 0,
832
+ 1 if if_save_every_weights18 == True else 0,
833
+ version19,
834
+ log_interval,
835
+ )
836
+ )
837
+ else:
838
+ cmd = (
839
+ config.python_cmd
840
+ + " train_nsf_sim_cache_sid_load_pretrain.py -e %s -sr %s -f0 %s -bs %s -te %s -se %s %s %s -l %s -c %s -sw %s -v %s -li %s"
841
+ % (
842
+ exp_dir1,
843
+ sr2,
844
+ 1 if if_f0_3 else 0,
845
+ batch_size12,
846
+ total_epoch11,
847
+ save_epoch10,
848
+ ("-pg %s" % pretrained_G14) if pretrained_G14 != "" else "\b",
849
+ ("-pd %s" % pretrained_D15) if pretrained_D15 != "" else "\b",
850
+ 1 if if_save_latest13 == True else 0,
851
+ 1 if if_cache_gpu17 == True else 0,
852
+ 1 if if_save_every_weights18 == True else 0,
853
+ version19,
854
+ log_interval,
855
+ )
856
+ )
857
+ print(cmd)
858
+ p = Popen(cmd, shell=True, cwd=now_dir)
859
+ global PID
860
+ PID = p.pid
861
+ p.wait()
862
+ return ("训练结束, 您可查看控制台训练日志或实验文件夹下的train.log", {"visible": False, "__type__": "update"}, {"visible": True, "__type__": "update"})
863
+
864
+
865
+ # but4.click(train_index, [exp_dir1], info3)
866
+ def train_index(exp_dir1, version19):
867
+ exp_dir = "%s/logs/%s" % (now_dir, exp_dir1)
868
+ os.makedirs(exp_dir, exist_ok=True)
869
+ feature_dir = (
870
+ "%s/3_feature256" % (exp_dir)
871
+ if version19 == "v1"
872
+ else "%s/3_feature768" % (exp_dir)
873
+ )
874
+ if os.path.exists(feature_dir) == False:
875
+ return "请先进行特征提取!"
876
+ listdir_res = list(os.listdir(feature_dir))
877
+ if len(listdir_res) == 0:
878
+ return "请先进行特征提取!"
879
+ npys = []
880
+ for name in sorted(listdir_res):
881
+ phone = np.load("%s/%s" % (feature_dir, name))
882
+ npys.append(phone)
883
+ big_npy = np.concatenate(npys, 0)
884
+ big_npy_idx = np.arange(big_npy.shape[0])
885
+ np.random.shuffle(big_npy_idx)
886
+ big_npy = big_npy[big_npy_idx]
887
+ np.save("%s/total_fea.npy" % exp_dir, big_npy)
888
+ # n_ivf = big_npy.shape[0] // 39
889
+ n_ivf = min(int(16 * np.sqrt(big_npy.shape[0])), big_npy.shape[0] // 39)
890
+ infos = []
891
+ infos.append("%s,%s" % (big_npy.shape, n_ivf))
892
+ yield "\n".join(infos)
893
+ index = faiss.index_factory(256 if version19 == "v1" else 768, "IVF%s,Flat" % n_ivf)
894
+ # index = faiss.index_factory(256if version19=="v1"else 768, "IVF%s,PQ128x4fs,RFlat"%n_ivf)
895
+ infos.append("training")
896
+ yield "\n".join(infos)
897
+ index_ivf = faiss.extract_index_ivf(index) #
898
+ index_ivf.nprobe = 1
899
+ index.train(big_npy)
900
+ faiss.write_index(
901
+ index,
902
+ "%s/trained_IVF%s_Flat_nprobe_%s_%s_%s.index"
903
+ % (exp_dir, n_ivf, index_ivf.nprobe, exp_dir1, version19),
904
+ )
905
+ # faiss.write_index(index, '%s/trained_IVF%s_Flat_FastScan_%s.index'%(exp_dir,n_ivf,version19))
906
+ infos.append("adding")
907
+ yield "\n".join(infos)
908
+ batch_size_add = 8192
909
+ for i in range(0, big_npy.shape[0], batch_size_add):
910
+ index.add(big_npy[i : i + batch_size_add])
911
+ faiss.write_index(
912
+ index,
913
+ "%s/added_IVF%s_Flat_nprobe_%s_%s_%s.index"
914
+ % (exp_dir, n_ivf, index_ivf.nprobe, exp_dir1, version19),
915
+ )
916
+ infos.append(
917
+ "成功构建索引,added_IVF%s_Flat_nprobe_%s_%s_%s.index"
918
+ % (n_ivf, index_ivf.nprobe, exp_dir1, version19)
919
+ )
920
+ # faiss.write_index(index, '%s/added_IVF%s_Flat_FastScan_%s.index'%(exp_dir,n_ivf,version19))
921
+ # infos.append("成功构建索引,added_IVF%s_Flat_FastScan_%s.index"%(n_ivf,version19))
922
+ yield "\n".join(infos)
923
+
924
+
925
+ # but5.click(train1key, [exp_dir1, sr2, if_f0_3, trainset_dir4, spk_id5, gpus6, np7, f0method8, save_epoch10, total_epoch11, batch_size12, if_save_latest13, pretrained_G14, pretrained_D15, gpus16, if_cache_gpu17], info3)
926
+ def train1key(
927
+ exp_dir1,
928
+ sr2,
929
+ if_f0_3,
930
+ trainset_dir4,
931
+ spk_id5,
932
+ np7,
933
+ f0method8,
934
+ save_epoch10,
935
+ total_epoch11,
936
+ batch_size12,
937
+ if_save_latest13,
938
+ pretrained_G14,
939
+ pretrained_D15,
940
+ gpus16,
941
+ if_cache_gpu17,
942
+ if_save_every_weights18,
943
+ version19,
944
+ echl
945
+ ):
946
+ infos = []
947
+
948
+ def get_info_str(strr):
949
+ infos.append(strr)
950
+ return "\n".join(infos)
951
+
952
+ model_log_dir = "%s/logs/%s" % (now_dir, exp_dir1)
953
+ preprocess_log_path = "%s/preprocess.log" % model_log_dir
954
+ extract_f0_feature_log_path = "%s/extract_f0_feature.log" % model_log_dir
955
+ gt_wavs_dir = "%s/0_gt_wavs" % model_log_dir
956
+ feature_dir = (
957
+ "%s/3_feature256" % model_log_dir
958
+ if version19 == "v1"
959
+ else "%s/3_feature768" % model_log_dir
960
+ )
961
+
962
+ os.makedirs(model_log_dir, exist_ok=True)
963
+ #########step1:处理数据
964
+ open(preprocess_log_path, "w").close()
965
+ cmd = (
966
+ config.python_cmd
967
+ + " trainset_preprocess_pipeline_print.py %s %s %s %s "
968
+ % (trainset_dir4, sr_dict[sr2], np7, model_log_dir)
969
+ + str(config.noparallel)
970
+ )
971
+ yield get_info_str(i18n("step1:正在处理数据"))
972
+ yield get_info_str(cmd)
973
+ p = Popen(cmd, shell=True)
974
+ p.wait()
975
+ with open(preprocess_log_path, "r") as f:
976
+ print(f.read())
977
+ #########step2a:提取音高
978
+ open(extract_f0_feature_log_path, "w")
979
+ if if_f0_3:
980
+ yield get_info_str("step2a:正在提取音高")
981
+ cmd = config.python_cmd + " extract_f0_print.py %s %s %s %s" % (
982
+ model_log_dir,
983
+ np7,
984
+ f0method8,
985
+ echl
986
+ )
987
+ yield get_info_str(cmd)
988
+ p = Popen(cmd, shell=True, cwd=now_dir)
989
+ p.wait()
990
+ with open(extract_f0_feature_log_path, "r") as f:
991
+ print(f.read())
992
+ else:
993
+ yield get_info_str(i18n("step2a:无需提取音高"))
994
+ #######step2b:提取特征
995
+ yield get_info_str(i18n("step2b:正在提取特征"))
996
+ gpus = gpus16.split("-")
997
+ leng = len(gpus)
998
+ ps = []
999
+ for idx, n_g in enumerate(gpus):
1000
+ cmd = config.python_cmd + " extract_feature_print.py %s %s %s %s %s %s" % (
1001
+ config.device,
1002
+ leng,
1003
+ idx,
1004
+ n_g,
1005
+ model_log_dir,
1006
+ version19,
1007
+ )
1008
+ yield get_info_str(cmd)
1009
+ p = Popen(
1010
+ cmd, shell=True, cwd=now_dir
1011
+ ) # , shell=True, stdin=PIPE, stdout=PIPE, stderr=PIPE, cwd=now_dir
1012
+ ps.append(p)
1013
+ for p in ps:
1014
+ p.wait()
1015
+ with open(extract_f0_feature_log_path, "r") as f:
1016
+ print(f.read())
1017
+ #######step3a:训练模型
1018
+ yield get_info_str(i18n("step3a:正在训练模型"))
1019
+ # 生成filelist
1020
+ if if_f0_3:
1021
+ f0_dir = "%s/2a_f0" % model_log_dir
1022
+ f0nsf_dir = "%s/2b-f0nsf" % model_log_dir
1023
+ names = (
1024
+ set([name.split(".")[0] for name in os.listdir(gt_wavs_dir)])
1025
+ & set([name.split(".")[0] for name in os.listdir(feature_dir)])
1026
+ & set([name.split(".")[0] for name in os.listdir(f0_dir)])
1027
+ & set([name.split(".")[0] for name in os.listdir(f0nsf_dir)])
1028
+ )
1029
+ else:
1030
+ names = set([name.split(".")[0] for name in os.listdir(gt_wavs_dir)]) & set(
1031
+ [name.split(".")[0] for name in os.listdir(feature_dir)]
1032
+ )
1033
+ opt = []
1034
+ for name in names:
1035
+ if if_f0_3:
1036
+ opt.append(
1037
+ "%s/%s.wav|%s/%s.npy|%s/%s.wav.npy|%s/%s.wav.npy|%s"
1038
+ % (
1039
+ gt_wavs_dir.replace("\\", "\\\\"),
1040
+ name,
1041
+ feature_dir.replace("\\", "\\\\"),
1042
+ name,
1043
+ f0_dir.replace("\\", "\\\\"),
1044
+ name,
1045
+ f0nsf_dir.replace("\\", "\\\\"),
1046
+ name,
1047
+ spk_id5,
1048
+ )
1049
+ )
1050
+ else:
1051
+ opt.append(
1052
+ "%s/%s.wav|%s/%s.npy|%s"
1053
+ % (
1054
+ gt_wavs_dir.replace("\\", "\\\\"),
1055
+ name,
1056
+ feature_dir.replace("\\", "\\\\"),
1057
+ name,
1058
+ spk_id5,
1059
+ )
1060
+ )
1061
+ fea_dim = 256 if version19 == "v1" else 768
1062
+ if if_f0_3:
1063
+ for _ in range(2):
1064
+ opt.append(
1065
+ "%s/logs/mute/0_gt_wavs/mute%s.wav|%s/logs/mute/3_feature%s/mute.npy|%s/logs/mute/2a_f0/mute.wav.npy|%s/logs/mute/2b-f0nsf/mute.wav.npy|%s"
1066
+ % (now_dir, sr2, now_dir, fea_dim, now_dir, now_dir, spk_id5)
1067
+ )
1068
+ else:
1069
+ for _ in range(2):
1070
+ opt.append(
1071
+ "%s/logs/mute/0_gt_wavs/mute%s.wav|%s/logs/mute/3_feature%s/mute.npy|%s"
1072
+ % (now_dir, sr2, now_dir, fea_dim, spk_id5)
1073
+ )
1074
+ shuffle(opt)
1075
+ with open("%s/filelist.txt" % model_log_dir, "w") as f:
1076
+ f.write("\n".join(opt))
1077
+ yield get_info_str("write filelist done")
1078
+ if gpus16:
1079
+ cmd = (
1080
+ config.python_cmd
1081
+ +" train_nsf_sim_cache_sid_load_pretrain.py -e %s -sr %s -f0 %s -bs %s -g %s -te %s -se %s %s %s -l %s -c %s -sw %s -v %s"
1082
+ % (
1083
+ exp_dir1,
1084
+ sr2,
1085
+ 1 if if_f0_3 else 0,
1086
+ batch_size12,
1087
+ gpus16,
1088
+ total_epoch11,
1089
+ save_epoch10,
1090
+ ("-pg %s" % pretrained_G14) if pretrained_G14 != "" else "",
1091
+ ("-pd %s" % pretrained_D15) if pretrained_D15 != "" else "",
1092
+ 1 if if_save_latest13 == True else 0,
1093
+ 1 if if_cache_gpu17 == True else 0,
1094
+ 1 if if_save_every_weights18 == True else 0,
1095
+ version19,
1096
+ )
1097
+ )
1098
+ else:
1099
+ cmd = (
1100
+ config.python_cmd
1101
+ + " train_nsf_sim_cache_sid_load_pretrain.py -e %s -sr %s -f0 %s -bs %s -te %s -se %s %s %s -l %s -c %s -sw %s -v %s"
1102
+ % (
1103
+ exp_dir1,
1104
+ sr2,
1105
+ 1 if if_f0_3 else 0,
1106
+ batch_size12,
1107
+ total_epoch11,
1108
+ save_epoch10,
1109
+ ("-pg %s" % pretrained_G14) if pretrained_G14 != "" else "",
1110
+ ("-pd %s" % pretrained_D15) if pretrained_D15 != "" else "",
1111
+ 1 if if_save_latest13 == True else 0,
1112
+ 1 if if_cache_gpu17 == True else 0,
1113
+ 1 if if_save_every_weights18 == True else 0,
1114
+ version19,
1115
+ )
1116
+ )
1117
+ yield get_info_str(cmd)
1118
+ p = Popen(cmd, shell=True, cwd=now_dir)
1119
+ p.wait()
1120
+ yield get_info_str(i18n("训练结束, 您可查看控制台训练日志或实验文件夹下的train.log"))
1121
+ #######step3b:训练索引
1122
+ npys = []
1123
+ listdir_res = list(os.listdir(feature_dir))
1124
+ for name in sorted(listdir_res):
1125
+ phone = np.load("%s/%s" % (feature_dir, name))
1126
+ npys.append(phone)
1127
+ big_npy = np.concatenate(npys, 0)
1128
+
1129
+ big_npy_idx = np.arange(big_npy.shape[0])
1130
+ np.random.shuffle(big_npy_idx)
1131
+ big_npy = big_npy[big_npy_idx]
1132
+ np.save("%s/total_fea.npy" % model_log_dir, big_npy)
1133
+
1134
+ # n_ivf = big_npy.shape[0] // 39
1135
+ n_ivf = min(int(16 * np.sqrt(big_npy.shape[0])), big_npy.shape[0] // 39)
1136
+ yield get_info_str("%s,%s" % (big_npy.shape, n_ivf))
1137
+ index = faiss.index_factory(256 if version19 == "v1" else 768, "IVF%s,Flat" % n_ivf)
1138
+ yield get_info_str("training index")
1139
+ index_ivf = faiss.extract_index_ivf(index) #
1140
+ index_ivf.nprobe = 1
1141
+ index.train(big_npy)
1142
+ faiss.write_index(
1143
+ index,
1144
+ "%s/trained_IVF%s_Flat_nprobe_%s_%s_%s.index"
1145
+ % (model_log_dir, n_ivf, index_ivf.nprobe, exp_dir1, version19),
1146
+ )
1147
+ yield get_info_str("adding index")
1148
+ batch_size_add = 8192
1149
+ for i in range(0, big_npy.shape[0], batch_size_add):
1150
+ index.add(big_npy[i : i + batch_size_add])
1151
+ faiss.write_index(
1152
+ index,
1153
+ "%s/added_IVF%s_Flat_nprobe_%s_%s_%s.index"
1154
+ % (model_log_dir, n_ivf, index_ivf.nprobe, exp_dir1, version19),
1155
+ )
1156
+ yield get_info_str(
1157
+ "成功构建索引, added_IVF%s_Flat_nprobe_%s_%s_%s.index"
1158
+ % (n_ivf, index_ivf.nprobe, exp_dir1, version19)
1159
+ )
1160
+ yield get_info_str(i18n("全流程结束!"))
1161
+
1162
+
1163
+ def whethercrepeornah(radio):
1164
+ mango = True if radio == 'mangio-crepe' or radio == 'mangio-crepe-tiny' else False
1165
+ return ({"visible": mango, "__type__": "update"})
1166
+
1167
+ # ckpt_path2.change(change_info_,[ckpt_path2],[sr__,if_f0__])
1168
+ def change_info_(ckpt_path):
1169
+ if (
1170
+ os.path.exists(ckpt_path.replace(os.path.basename(ckpt_path), "train.log"))
1171
+ == False
1172
+ ):
1173
+ return {"__type__": "update"}, {"__type__": "update"}, {"__type__": "update"}
1174
+ try:
1175
+ with open(
1176
+ ckpt_path.replace(os.path.basename(ckpt_path), "train.log"), "r"
1177
+ ) as f:
1178
+ info = eval(f.read().strip("\n").split("\n")[0].split("\t")[-1])
1179
+ sr, f0 = info["sample_rate"], info["if_f0"]
1180
+ version = "v2" if ("version" in info and info["version"] == "v2") else "v1"
1181
+ return sr, str(f0), version
1182
+ except:
1183
+ traceback.print_exc()
1184
+ return {"__type__": "update"}, {"__type__": "update"}, {"__type__": "update"}
1185
+
1186
+
1187
+ from lib.infer_pack.models_onnx import SynthesizerTrnMsNSFsidM
1188
+
1189
+
1190
+ def export_onnx(ModelPath, ExportedPath, MoeVS=True):
1191
+ cpt = torch.load(ModelPath, map_location="cpu")
1192
+ cpt["config"][-3] = cpt["weight"]["emb_g.weight"].shape[0] # n_spk
1193
+ hidden_channels = 256 if cpt.get("version","v1")=="v1"else 768#cpt["config"][-2] # hidden_channels,为768Vec做准备
1194
+
1195
+ test_phone = torch.rand(1, 200, hidden_channels) # hidden unit
1196
+ test_phone_lengths = torch.tensor([200]).long() # hidden unit 长度(貌似没啥用)
1197
+ test_pitch = torch.randint(size=(1, 200), low=5, high=255) # 基频(单位赫兹)
1198
+ test_pitchf = torch.rand(1, 200) # nsf基频
1199
+ test_ds = torch.LongTensor([0]) # 说话人ID
1200
+ test_rnd = torch.rand(1, 192, 200) # 噪声(加入随机因子)
1201
+
1202
+ device = "cpu" # 导出时设备(不影响使用模型)
1203
+
1204
+
1205
+ net_g = SynthesizerTrnMsNSFsidM(
1206
+ *cpt["config"], is_half=False,version=cpt.get("version","v1")
1207
+ ) # fp32导出(C++要支持fp16必须手动将内存重新排列所以暂时不用fp16)
1208
+ net_g.load_state_dict(cpt["weight"], strict=False)
1209
+ input_names = ["phone", "phone_lengths", "pitch", "pitchf", "ds", "rnd"]
1210
+ output_names = [
1211
+ "audio",
1212
+ ]
1213
+ # net_g.construct_spkmixmap(n_speaker) 多角色混合轨道导出
1214
+ torch.onnx.export(
1215
+ net_g,
1216
+ (
1217
+ test_phone.to(device),
1218
+ test_phone_lengths.to(device),
1219
+ test_pitch.to(device),
1220
+ test_pitchf.to(device),
1221
+ test_ds.to(device),
1222
+ test_rnd.to(device),
1223
+ ),
1224
+ ExportedPath,
1225
+ dynamic_axes={
1226
+ "phone": [1],
1227
+ "pitch": [1],
1228
+ "pitchf": [1],
1229
+ "rnd": [2],
1230
+ },
1231
+ do_constant_folding=False,
1232
+ opset_version=16,
1233
+ verbose=False,
1234
+ input_names=input_names,
1235
+ output_names=output_names,
1236
+ )
1237
+ return "Finished"
1238
+
1239
+ #region RVC WebUI App
1240
+
1241
+ def get_presets():
1242
+ data = None
1243
+ with open('../inference-presets.json', 'r') as file:
1244
+ data = json.load(file)
1245
+ preset_names = []
1246
+ for preset in data['presets']:
1247
+ preset_names.append(preset['name'])
1248
+
1249
+ return preset_names
1250
+
1251
+ def change_choices2():
1252
+ audio_files=[]
1253
+ for filename in os.listdir("./audios"):
1254
+ if filename.endswith(('.wav','.mp3','.ogg','.flac','.m4a','.aac','.mp4')):
1255
+ audio_files.append(os.path.join('./audios',filename).replace('\\', '/'))
1256
+ return {"choices": sorted(audio_files), "__type__": "update"}, {"__type__": "update"}
1257
+
1258
+ audio_files=[]
1259
+ for filename in os.listdir("./audios"):
1260
+ if filename.endswith(('.wav','.mp3','.ogg','.flac','.m4a','.aac','.mp4')):
1261
+ audio_files.append(os.path.join('./audios',filename).replace('\\', '/'))
1262
+
1263
+ def get_index():
1264
+ if check_for_name() != '':
1265
+ chosen_model=sorted(names)[0].split(".")[0]
1266
+ logs_path="./logs/"+chosen_model
1267
+ if os.path.exists(logs_path):
1268
+ for file in os.listdir(logs_path):
1269
+ if file.endswith(".index"):
1270
+ return os.path.join(logs_path, file)
1271
+ return ''
1272
+ else:
1273
+ return ''
1274
+
1275
+ def get_indexes():
1276
+ indexes_list=[]
1277
+ for dirpath, dirnames, filenames in os.walk("./logs/"):
1278
+ for filename in filenames:
1279
+ if filename.endswith(".index"):
1280
+ indexes_list.append(os.path.join(dirpath,filename))
1281
+ if len(indexes_list) > 0:
1282
+ return indexes_list
1283
+ else:
1284
+ return ''
1285
+
1286
+ def get_name():
1287
+ if len(audio_files) > 0:
1288
+ return sorted(audio_files)[0]
1289
+ else:
1290
+ return ''
1291
+
1292
+ def save_to_wav(record_button):
1293
+ if record_button is None:
1294
+ pass
1295
+ else:
1296
+ path_to_file=record_button
1297
+ new_name = datetime.datetime.now().strftime("%Y-%m-%d_%H-%M-%S")+'.wav'
1298
+ new_path='./audios/'+new_name
1299
+ shutil.move(path_to_file,new_path)
1300
+ return new_path
1301
+
1302
+ def save_to_wav2(dropbox):
1303
+ file_path=dropbox.name
1304
+ shutil.move(file_path,'./audios')
1305
+ return os.path.join('./audios',os.path.basename(file_path))
1306
+
1307
+ def match_index(sid0):
1308
+ folder=sid0.split(".")[0]
1309
+ parent_dir="./logs/"+folder
1310
+ if os.path.exists(parent_dir):
1311
+ for filename in os.listdir(parent_dir):
1312
+ if filename.endswith(".index"):
1313
+ index_path=os.path.join(parent_dir,filename)
1314
+ return index_path
1315
+ else:
1316
+ return ''
1317
+
1318
+ def check_for_name():
1319
+ if len(names) > 0:
1320
+ return sorted(names)[0]
1321
+ else:
1322
+ return ''
1323
+
1324
+ def download_from_url(url, model):
1325
+ if url == '':
1326
+ return "URL cannot be left empty."
1327
+ if model =='':
1328
+ return "You need to name your model. For example: My-Model"
1329
+ url = url.strip()
1330
+ zip_dirs = ["zips", "unzips"]
1331
+ for directory in zip_dirs:
1332
+ if os.path.exists(directory):
1333
+ shutil.rmtree(directory)
1334
+ os.makedirs("zips", exist_ok=True)
1335
+ os.makedirs("unzips", exist_ok=True)
1336
+ zipfile = model + '.zip'
1337
+ zipfile_path = './zips/' + zipfile
1338
+ try:
1339
+ if "drive.google.com" in url:
1340
+ subprocess.run(["gdown", url, "--fuzzy", "-O", zipfile_path])
1341
+ elif "mega.nz" in url:
1342
+ m = Mega()
1343
+ m.download_url(url, './zips')
1344
+ else:
1345
+ subprocess.run(["wget", url, "-O", zipfile_path])
1346
+ for filename in os.listdir("./zips"):
1347
+ if filename.endswith(".zip"):
1348
+ zipfile_path = os.path.join("./zips/",filename)
1349
+ shutil.unpack_archive(zipfile_path, "./unzips", 'zip')
1350
+ else:
1351
+ return "No zipfile found."
1352
+ for root, dirs, files in os.walk('./unzips'):
1353
+ for file in files:
1354
+ file_path = os.path.join(root, file)
1355
+ if file.endswith(".index"):
1356
+ os.mkdir(f'./logs/{model}')
1357
+ shutil.copy2(file_path,f'./logs/{model}')
1358
+ elif "G_" not in file and "D_" not in file and file.endswith(".pth"):
1359
+ shutil.copy(file_path,f'./weights/{model}.pth')
1360
+ shutil.rmtree("zips")
1361
+ shutil.rmtree("unzips")
1362
+ return "Success."
1363
+ except:
1364
+ return "There's been an error."
1365
+ def success_message(face):
1366
+ return f'{face.name} has been uploaded.', 'None'
1367
+ def mouth(size, face, voice, faces):
1368
+ if size == 'Half':
1369
+ size = 2
1370
+ else:
1371
+ size = 1
1372
+ if faces == 'None':
1373
+ character = face.name
1374
+ else:
1375
+ if faces == 'Ben Shapiro':
1376
+ character = '/content/wav2lip-HD/inputs/ben-shapiro-10.mp4'
1377
+ elif faces == 'Andrew Tate':
1378
+ character = '/content/wav2lip-HD/inputs/tate-7.mp4'
1379
+ command = "python inference.py " \
1380
+ "--checkpoint_path checkpoints/wav2lip.pth " \
1381
+ f"--face {character} " \
1382
+ f"--audio {voice} " \
1383
+ "--pads 0 20 0 0 " \
1384
+ "--outfile /content/wav2lip-HD/outputs/result.mp4 " \
1385
+ "--fps 24 " \
1386
+ f"--resize_factor {size}"
1387
+ process = subprocess.Popen(command, shell=True, cwd='/content/wav2lip-HD/Wav2Lip-master')
1388
+ stdout, stderr = process.communicate()
1389
+ return '/content/wav2lip-HD/outputs/result.mp4', 'Animation completed.'
1390
+ eleven_voices = ['Adam','Antoni','Josh','Arnold','Sam','Bella','Rachel','Domi','Elli']
1391
+ eleven_voices_ids=['pNInz6obpgDQGcFmaJgB','ErXwobaYiN019PkySvjV','TxGEqnHWrfWFTfGW9XjX','VR6AewLTigWG4xSOukaG','yoZ06aMxZJJ28mfd3POQ','EXAVITQu4vr4xnSDxMaL','21m00Tcm4TlvDq8ikWAM','AZnzlk1XvdvUeBnXmlld','MF3mGyEYCl7XYWbV9V6O']
1392
+ chosen_voice = dict(zip(eleven_voices, eleven_voices_ids))
1393
+
1394
+ def stoptraining(mim):
1395
+ if int(mim) == 1:
1396
+ try:
1397
+ CSVutil('csvdb/stop.csv', 'w+', 'stop', 'True')
1398
+ os.kill(PID, signal.SIGTERM)
1399
+ except Exception as e:
1400
+ print(f"Couldn't click due to {e}")
1401
+ return (
1402
+ {"visible": False, "__type__": "update"},
1403
+ {"visible": True, "__type__": "update"},
1404
+ )
1405
+
1406
+
1407
+ def elevenTTS(xiapi, text, id, lang):
1408
+ if xiapi!= '' and id !='':
1409
+ choice = chosen_voice[id]
1410
+ CHUNK_SIZE = 1024
1411
+ url = f"https://api.elevenlabs.io/v1/text-to-speech/{choice}"
1412
+ headers = {
1413
+ "Accept": "audio/mpeg",
1414
+ "Content-Type": "application/json",
1415
+ "xi-api-key": xiapi
1416
+ }
1417
+ if lang == 'en':
1418
+ data = {
1419
+ "text": text,
1420
+ "model_id": "eleven_monolingual_v1",
1421
+ "voice_settings": {
1422
+ "stability": 0.5,
1423
+ "similarity_boost": 0.5
1424
+ }
1425
+ }
1426
+ else:
1427
+ data = {
1428
+ "text": text,
1429
+ "model_id": "eleven_multilingual_v1",
1430
+ "voice_settings": {
1431
+ "stability": 0.5,
1432
+ "similarity_boost": 0.5
1433
+ }
1434
+ }
1435
+
1436
+ response = requests.post(url, json=data, headers=headers)
1437
+ with open('./temp_eleven.mp3', 'wb') as f:
1438
+ for chunk in response.iter_content(chunk_size=CHUNK_SIZE):
1439
+ if chunk:
1440
+ f.write(chunk)
1441
+ aud_path = save_to_wav('./temp_eleven.mp3')
1442
+ return aud_path, aud_path
1443
+ else:
1444
+ tts = gTTS(text, lang=lang)
1445
+ tts.save('./temp_gTTS.mp3')
1446
+ aud_path = save_to_wav('./temp_gTTS.mp3')
1447
+ return aud_path, aud_path
1448
+
1449
+ def upload_to_dataset(files, dir):
1450
+ if dir == '':
1451
+ dir = './dataset'
1452
+ if not os.path.exists(dir):
1453
+ os.makedirs(dir)
1454
+ count = 0
1455
+ for file in files:
1456
+ path=file.name
1457
+ shutil.copy2(path,dir)
1458
+ count += 1
1459
+ return f' {count} files uploaded to {dir}.'
1460
+
1461
+ def zip_downloader(model):
1462
+ if not os.path.exists(f'./weights/{model}.pth'):
1463
+ return {"__type__": "update"}, f'Make sure the Voice Name is correct. I could not find {model}.pth'
1464
+ index_found = False
1465
+ for file in os.listdir(f'./logs/{model}'):
1466
+ if file.endswith('.index') and 'added' in file:
1467
+ log_file = file
1468
+ index_found = True
1469
+ if index_found:
1470
+ return [f'./weights/{model}.pth', f'./logs/{model}/{log_file}'], "Done"
1471
+ else:
1472
+ return f'./weights/{model}.pth', "Could not find Index file."
1473
+
1474
+ with gr.Blocks(theme=gr.themes.Base(), title='Mangio-RVC-Web 💻') as app:
1475
+ with gr.Tabs():
1476
+ with gr.TabItem("Интерфейс"):
1477
+ gr.HTML("<h1> RVC V2 Huggingface Version </h1>")
1478
+ gr.HTML("<h10> Huggingface версия созданная Clebersla </h10>")
1479
+ gr.HTML("<h4> Если вы хотите использовать это помещение в частном порядке, я рекомендую продублировать его. </h4>")
1480
+
1481
+ # Inference Preset Row
1482
+ # with gr.Row():
1483
+ # mangio_preset = gr.Dropdown(label="Inference Preset", choices=sorted(get_presets()))
1484
+ # mangio_preset_name_save = gr.Textbox(
1485
+ # label="Your preset name"
1486
+ # )
1487
+ # mangio_preset_save_btn = gr.Button('Save Preset', variant="primary")
1488
+
1489
+ # Other RVC stuff
1490
+ with gr.Row():
1491
+ sid0 = gr.Dropdown(label="1.Выберите свою модель.", choices=sorted(names), value=check_for_name())
1492
+ refresh_button = gr.Button("Обновить", variant="primary")
1493
+ if check_for_name() != '':
1494
+ get_vc(sorted(names)[0])
1495
+ vc_transform0 = gr.Number(label="Дополнительно: Здесь можно изменить высоту тона или оставить ее равной 0.", value=0)
1496
+ #clean_button = gr.Button(i18n("卸载音色省显存"), variant="primary")
1497
+ spk_item = gr.Slider(
1498
+ minimum=0,
1499
+ maximum=2333,
1500
+ step=1,
1501
+ label=i18n("请选择说话人id"),
1502
+ value=0,
1503
+ visible=False,
1504
+ interactive=True,
1505
+ )
1506
+ #clean_button.click(fn=clean, inputs=[], outputs=[sid0])
1507
+ sid0.change(
1508
+ fn=get_vc,
1509
+ inputs=[sid0],
1510
+ outputs=[spk_item],
1511
+ )
1512
+ but0 = gr.Button("Конвертировать", variant="primary")
1513
+ with gr.Row():
1514
+ with gr.Column():
1515
+ with gr.Row():
1516
+ dropbox = gr.File(label='Отправьте аудиозапись сюда и нажмите кнопку "Перезагрузка".')
1517
+ with gr.Row():
1518
+ record_button=gr.Audio(source="microphone", label="Запись звука с микрофона", type="filepath")
1519
+ with gr.Row():
1520
+ input_audio0 = gr.Dropdown(
1521
+ label="2.Выберите аудиозапись.",
1522
+ value="./audios/someguy.mp3",
1523
+ choices=audio_files
1524
+ )
1525
+ dropbox.upload(fn=save_to_wav2, inputs=[dropbox], outputs=[input_audio0])
1526
+ dropbox.upload(fn=change_choices2, inputs=[], outputs=[input_audio0])
1527
+ refresh_button2 = gr.Button("Обновить", variant="primary", size='sm')
1528
+ record_button.change(fn=save_to_wav, inputs=[record_button], outputs=[input_audio0])
1529
+ record_button.change(fn=change_choices2, inputs=[], outputs=[input_audio0])
1530
+ with gr.Row():
1531
+ with gr.Accordion('Текст в речь', open=False):
1532
+ with gr.Column():
1533
+ lang = gr.Radio(label='Китайский и японский языки в настоящее время не работают с ElevenLabs.',choices=['en','es','ru','uk','pl','fr','de','tr'], value='en')
1534
+ api_box = gr.Textbox(label="Введите свой API-ключ для ElevenLabs или оставьте пустым, чтобы использовать GoogleTTS", value='', visible=False)
1535
+ elevenid=gr.Dropdown(label="Голос:", choices=eleven_voices)
1536
+ with gr.Column():
1537
+ tfs = gr.Textbox(label="Введите свой текст", interactive=True, value="This is a test.")
1538
+ tts_button = gr.Button(value="Генерировать")
1539
+ tts_button.click(fn=elevenTTS, inputs=[api_box,tfs, elevenid, lang], outputs=[record_button, input_audio0])
1540
+ with gr.Row():
1541
+ with gr.Accordion('Wav2Lip', open=False, visible=False):
1542
+ with gr.Row():
1543
+ size = gr.Radio(label='Resolution:',choices=['Half','Full'])
1544
+ face = gr.UploadButton("Upload A Character",type='file')
1545
+ faces = gr.Dropdown(label="OR Choose one:", choices=['None','Ben Shapiro','Andrew Tate'])
1546
+ with gr.Row():
1547
+ preview = gr.Textbox(label="Status:",interactive=False)
1548
+ face.upload(fn=success_message,inputs=[face], outputs=[preview, faces])
1549
+ with gr.Row():
1550
+ animation = gr.Video(type='filepath')
1551
+ refresh_button2.click(fn=change_choices2, inputs=[], outputs=[input_audio0, animation])
1552
+ with gr.Row():
1553
+ animate_button = gr.Button('Animate')
1554
+
1555
+ with gr.Column():
1556
+ with gr.Accordion("Настройка индекса", open=False):
1557
+ file_index1 = gr.Dropdown(
1558
+ label="3. Путь к файлу added.index (если он не был найден автоматически).",
1559
+ choices=get_indexes(),
1560
+ value=get_index(),
1561
+ interactive=True,
1562
+ )
1563
+ sid0.change(fn=match_index, inputs=[sid0],outputs=[file_index1])
1564
+ refresh_button.click(
1565
+ fn=change_choices, inputs=[], outputs=[sid0, file_index1]
1566
+ )
1567
+ # file_big_npy1 = gr.Textbox(
1568
+ # label=i18n("特征文件路径"),
1569
+ # value="E:\\codes\py39\\vits_vc_gpu_train\\logs\\mi-test-1key\\total_fea.npy",
1570
+ # interactive=True,
1571
+ # )
1572
+ index_rate1 = gr.Slider(
1573
+ minimum=0,
1574
+ maximum=1,
1575
+ label=i18n("Соотношение поисковых функций (советую ставить на 0):"),
1576
+ value=0.66,
1577
+ interactive=True,
1578
+ )
1579
+ vc_output2 = gr.Audio(
1580
+ label="Выходные аудиоданные (нажмите на три точки в правом углу, чтобы загрузить)",
1581
+ type='filepath',
1582
+ interactive=False,
1583
+ )
1584
+ animate_button.click(fn=mouth, inputs=[size, face, vc_output2, faces], outputs=[animation, preview])
1585
+ with gr.Accordion("Дополнительные настройки", open=False):
1586
+ f0method0 = gr.Radio(
1587
+ label='Необязательно: Изменить алгоритм извлечения высоты тона.\Методы извлечения отсортированы от "худшего качества" к "лучшему качеству".\mangio-crepe может быть лучше rmvpe или нет в случаях, когда "гладк��сть" более важна, но в целом rmvpe является лучшим.',
1588
+ choices=["pm", "dio", "crepe-tiny", "mangio-crepe-tiny", "crepe", "harvest", "mangio-crepe", "rmvpe"], # Fork Feature. Add Crepe-Tiny
1589
+ value="rmvpe",
1590
+ interactive=True,
1591
+ )
1592
+
1593
+ crepe_hop_length = gr.Slider(
1594
+ minimum=1,
1595
+ maximum=512,
1596
+ step=1,
1597
+ label="Mangio-Crepe Hop Length. Более высокие числа уменьшат вероятность экстремального изменения высоты тона, но более низкие числа увеличат точность. 64-192 - хороший диапазон для экспериментов.",
1598
+ value=120,
1599
+ interactive=True,
1600
+ visible=False,
1601
+ )
1602
+ f0method0.change(fn=whethercrepeornah, inputs=[f0method0], outputs=[crepe_hop_length])
1603
+ filter_radius0 = gr.Slider(
1604
+ minimum=0,
1605
+ maximum=7,
1606
+ label=i18n("Если >=3: применить медианную фильтрацию к собранным результатам питча. Значение представляет собой радиус фильтрации и может уменьшить дыхание."),
1607
+ value=3,
1608
+ step=1,
1609
+ interactive=True,
1610
+ )
1611
+ resample_sr0 = gr.Slider(
1612
+ minimum=0,
1613
+ maximum=48000,
1614
+ label=i18n("后处理重采样至最终采样率,0为不进行重采样"),
1615
+ value=0,
1616
+ step=1,
1617
+ interactive=True,
1618
+ visible=False
1619
+ )
1620
+ rms_mix_rate0 = gr.Slider(
1621
+ minimum=0,
1622
+ maximum=1,
1623
+ label=i18n("Используйте огибающую громкости входа для замены или смешивания с огибающей громкости выхода. Чем ближе это соотношение к 1, тем больше используется огибающая выходного сигнала:"),
1624
+ value=0.21,
1625
+ interactive=True,
1626
+ )
1627
+ protect0 = gr.Slider(
1628
+ minimum=0,
1629
+ maximum=0.5,
1630
+ label=i18n("Защита безголосых согласных и звуков дыхания для предотвращения артефактов, таких как разрывы в электронной музыке. Для отключения установите значение 0,5. Уменьшите значение для усиления защиты, но это может снизить точность индексирования:"),
1631
+ value=0.33,
1632
+ step=0.01,
1633
+ interactive=True,
1634
+ )
1635
+ formanting = gr.Checkbox(
1636
+ value=bool(DoFormant),
1637
+ label="[EXPERIMENTAL] Formant shift inference audio",
1638
+ info="Used for male to female and vice-versa conversions",
1639
+ interactive=True,
1640
+ visible=False,
1641
+ )
1642
+
1643
+ formant_preset = gr.Dropdown(
1644
+ value='',
1645
+ choices=get_fshift_presets(),
1646
+ label="browse presets for formanting",
1647
+ visible=bool(DoFormant),
1648
+ )
1649
+ formant_refresh_button = gr.Button(
1650
+ value='\U0001f504',
1651
+ visible=bool(DoFormant),
1652
+ variant='primary',
1653
+ )
1654
+ #formant_refresh_button = ToolButton( elem_id='1')
1655
+ #create_refresh_button(formant_preset, lambda: {"choices": formant_preset}, "refresh_list_shiftpresets")
1656
+
1657
+ qfrency = gr.Slider(
1658
+ value=Quefrency,
1659
+ info="Default value is 1.0",
1660
+ label="Quefrency for formant shifting",
1661
+ minimum=0.0,
1662
+ maximum=16.0,
1663
+ step=0.1,
1664
+ visible=bool(DoFormant),
1665
+ interactive=True,
1666
+ )
1667
+ tmbre = gr.Slider(
1668
+ value=Timbre,
1669
+ info="Default value is 1.0",
1670
+ label="Timbre for formant shifting",
1671
+ minimum=0.0,
1672
+ maximum=16.0,
1673
+ step=0.1,
1674
+ visible=bool(DoFormant),
1675
+ interactive=True,
1676
+ )
1677
+
1678
+ formant_preset.change(fn=preset_apply, inputs=[formant_preset, qfrency, tmbre], outputs=[qfrency, tmbre])
1679
+ frmntbut = gr.Button("Apply", variant="primary", visible=bool(DoFormant))
1680
+ formanting.change(fn=formant_enabled,inputs=[formanting,qfrency,tmbre,frmntbut,formant_preset,formant_refresh_button],outputs=[formanting,qfrency,tmbre,frmntbut,formant_preset,formant_refresh_button])
1681
+ frmntbut.click(fn=formant_apply,inputs=[qfrency, tmbre], outputs=[qfrency, tmbre])
1682
+ formant_refresh_button.click(fn=update_fshift_presets,inputs=[formant_preset, qfrency, tmbre],outputs=[formant_preset, qfrency, tmbre])
1683
+ with gr.Row():
1684
+ vc_output1 = gr.Textbox("")
1685
+ f0_file = gr.File(label=i18n("F0曲线文件, 可选, 一行一个音高, 代替默认F0及升降调"), visible=False)
1686
+
1687
+ but0.click(
1688
+ vc_single,
1689
+ [
1690
+ spk_item,
1691
+ input_audio0,
1692
+ vc_transform0,
1693
+ f0_file,
1694
+ f0method0,
1695
+ file_index1,
1696
+ # file_index2,
1697
+ # file_big_npy1,
1698
+ index_rate1,
1699
+ filter_radius0,
1700
+ resample_sr0,
1701
+ rms_mix_rate0,
1702
+ protect0,
1703
+ crepe_hop_length
1704
+ ],
1705
+ [vc_output1, vc_output2],
1706
+ )
1707
+
1708
+ with gr.Accordion("Batch Conversion",open=False, visible=False):
1709
+ with gr.Row():
1710
+ with gr.Column():
1711
+ vc_transform1 = gr.Number(
1712
+ label=i18n("变调(整数, 半音数量, 升八度12降八度-12)"), value=0
1713
+ )
1714
+ opt_input = gr.Textbox(label=i18n("指定输出文件夹"), value="opt")
1715
+ f0method1 = gr.Radio(
1716
+ label=i18n(
1717
+ "选择音高提取算法,输入歌声可用pm提速,harvest低音好但巨慢无比,crepe效果好但吃GPU"
1718
+ ),
1719
+ choices=["pm", "harvest", "crepe", "rmvpe"],
1720
+ value="rmvpe",
1721
+ interactive=True,
1722
+ )
1723
+ filter_radius1 = gr.Slider(
1724
+ minimum=0,
1725
+ maximum=7,
1726
+ label=i18n(">=3则使用对harvest音高识别的结果使用中值滤波,数值为滤波半径,使用可以削弱哑音"),
1727
+ value=3,
1728
+ step=1,
1729
+ interactive=True,
1730
+ )
1731
+ with gr.Column():
1732
+ file_index3 = gr.Textbox(
1733
+ label=i18n("特征检索库文件路径,为空则使用下拉的选择结果"),
1734
+ value="",
1735
+ interactive=True,
1736
+ )
1737
+ file_index4 = gr.Dropdown(
1738
+ label=i18n("自动检测index路径,下拉式选择(dropdown)"),
1739
+ choices=sorted(index_paths),
1740
+ interactive=True,
1741
+ )
1742
+ refresh_button.click(
1743
+ fn=lambda: change_choices()[1],
1744
+ inputs=[],
1745
+ outputs=file_index4,
1746
+ )
1747
+ # file_big_npy2 = gr.Textbox(
1748
+ # label=i18n("特征文件路径"),
1749
+ # value="E:\\codes\\py39\\vits_vc_gpu_train\\logs\\mi-test-1key\\total_fea.npy",
1750
+ # interactive=True,
1751
+ # )
1752
+ index_rate2 = gr.Slider(
1753
+ minimum=0,
1754
+ maximum=1,
1755
+ label=i18n("检索特征占比"),
1756
+ value=1,
1757
+ interactive=True,
1758
+ )
1759
+ with gr.Column():
1760
+ resample_sr1 = gr.Slider(
1761
+ minimum=0,
1762
+ maximum=48000,
1763
+ label=i18n("后处理重采样至最终采样率,0为不进行重采样"),
1764
+ value=0,
1765
+ step=1,
1766
+ interactive=True,
1767
+ )
1768
+ rms_mix_rate1 = gr.Slider(
1769
+ minimum=0,
1770
+ maximum=1,
1771
+ label=i18n("输入源音量包络替换输出音量包络融合比例,越靠近1越使用输出包络"),
1772
+ value=1,
1773
+ interactive=True,
1774
+ )
1775
+ protect1 = gr.Slider(
1776
+ minimum=0,
1777
+ maximum=0.5,
1778
+ label=i18n(
1779
+ "保护清辅音和呼吸声,防止电音撕裂等artifact,拉满0.5不开启,调低加大保护力度但可能降低索引效果"
1780
+ ),
1781
+ value=0.33,
1782
+ step=0.01,
1783
+ interactive=True,
1784
+ )
1785
+ with gr.Column():
1786
+ dir_input = gr.Textbox(
1787
+ label=i18n("输入待处理音频文件夹路径(去文件管理器地址栏拷就行了)"),
1788
+ value="E:\codes\py39\\test-20230416b\\todo-songs",
1789
+ )
1790
+ inputs = gr.File(
1791
+ file_count="multiple", label=i18n("也可批量输入音频文件, 二选一, 优先读文件夹")
1792
+ )
1793
+ with gr.Row():
1794
+ format1 = gr.Radio(
1795
+ label=i18n("导出文件格式"),
1796
+ choices=["wav", "flac", "mp3", "m4a"],
1797
+ value="flac",
1798
+ interactive=True,
1799
+ )
1800
+ but1 = gr.Button(i18n("转换"), variant="primary")
1801
+ vc_output3 = gr.Textbox(label=i18n("输出信息"))
1802
+ but1.click(
1803
+ vc_multi,
1804
+ [
1805
+ spk_item,
1806
+ dir_input,
1807
+ opt_input,
1808
+ inputs,
1809
+ vc_transform1,
1810
+ f0method1,
1811
+ file_index3,
1812
+ file_index4,
1813
+ # file_big_npy2,
1814
+ index_rate2,
1815
+ filter_radius1,
1816
+ resample_sr1,
1817
+ rms_mix_rate1,
1818
+ protect1,
1819
+ format1,
1820
+ crepe_hop_length,
1821
+ ],
1822
+ [vc_output3],
1823
+ )
1824
+ but1.click(fn=lambda: easy_uploader.clear())
1825
+ with gr.TabItem("Загрузка моделей"):
1826
+ with gr.Row():
1827
+ url=gr.Textbox(label="Введите URL-адрес модели:")
1828
+ with gr.Row():
1829
+ model = gr.Textbox(label="Название модели:")
1830
+ download_button=gr.Button("Загрузить")
1831
+ with gr.Row():
1832
+ status_bar=gr.Textbox(label="")
1833
+ download_button.click(fn=download_from_url, inputs=[url, model], outputs=[status_bar])
1834
+ with gr.Row():
1835
+ gr.Markdown(
1836
+ """
1837
+ Made with ❤️ by [Alice Oliveira](https://github.com/aliceoq) | Hosted with ❤️ by [Mateus Elias](https://github.com/mateuseap)
1838
+ """
1839
+ )
1840
+
1841
+ def has_two_files_in_pretrained_folder():
1842
+ pretrained_folder = "./pretrained/"
1843
+ if not os.path.exists(pretrained_folder):
1844
+ return False
1845
+
1846
+ files_in_folder = os.listdir(pretrained_folder)
1847
+ num_files = len(files_in_folder)
1848
+ return num_files >= 2
1849
+
1850
+ if has_two_files_in_pretrained_folder():
1851
+ print("Pretrained weights are downloaded. Training tab enabled!\n-------------------------------")
1852
+ with gr.TabItem("Train", visible=False):
1853
+ with gr.Row():
1854
+ with gr.Column():
1855
+ exp_dir1 = gr.Textbox(label="Voice Name:", value="My-Voice")
1856
+ sr2 = gr.Radio(
1857
+ label=i18n("目标采样率"),
1858
+ choices=["40k", "48k"],
1859
+ value="40k",
1860
+ interactive=True,
1861
+ visible=False
1862
+ )
1863
+ if_f0_3 = gr.Radio(
1864
+ label=i18n("模型是否带音高指导(唱歌一定要, 语音可以不要)"),
1865
+ choices=[True, False],
1866
+ value=True,
1867
+ interactive=True,
1868
+ visible=False
1869
+ )
1870
+ version19 = gr.Radio(
1871
+ label="RVC version",
1872
+ choices=["v1", "v2"],
1873
+ value="v2",
1874
+ interactive=True,
1875
+ visible=False,
1876
+ )
1877
+ np7 = gr.Slider(
1878
+ minimum=0,
1879
+ maximum=config.n_cpu,
1880
+ step=1,
1881
+ label="# of CPUs for data processing (Leave as it is)",
1882
+ value=config.n_cpu,
1883
+ interactive=True,
1884
+ visible=True
1885
+ )
1886
+ trainset_dir4 = gr.Textbox(label="Path to your dataset (audios, not zip):", value="./dataset")
1887
+ easy_uploader = gr.Files(label='OR Drop your audios here. They will be uploaded in your dataset path above.',file_types=['audio'])
1888
+ but1 = gr.Button("1. Process The Dataset", variant="primary")
1889
+ info1 = gr.Textbox(label="Status (wait until it says 'end preprocess'):", value="")
1890
+ easy_uploader.upload(fn=upload_to_dataset, inputs=[easy_uploader, trainset_dir4], outputs=[info1])
1891
+ but1.click(
1892
+ preprocess_dataset, [trainset_dir4, exp_dir1, sr2, np7], [info1]
1893
+ )
1894
+ with gr.Column():
1895
+ spk_id5 = gr.Slider(
1896
+ minimum=0,
1897
+ maximum=4,
1898
+ step=1,
1899
+ label=i18n("请指定说话人id"),
1900
+ value=0,
1901
+ interactive=True,
1902
+ visible=False
1903
+ )
1904
+ with gr.Accordion('GPU Settings', open=False, visible=False):
1905
+ gpus6 = gr.Textbox(
1906
+ label=i18n("以-分隔输入使用的卡号, 例如 0-1-2 使用卡0和卡1和卡2"),
1907
+ value=gpus,
1908
+ interactive=True,
1909
+ visible=False
1910
+ )
1911
+ gpu_info9 = gr.Textbox(label=i18n("显卡信息"), value=gpu_info)
1912
+ f0method8 = gr.Radio(
1913
+ label=i18n(
1914
+ "选择音高提取算法:输入歌声可用pm提速,高质量语音但CPU差可用dio提速,harvest质量更好但慢"
1915
+ ),
1916
+ choices=["harvest","crepe", "mangio-crepe", "rmvpe"], # Fork feature: Crepe on f0 extraction for training.
1917
+ value="rmvpe",
1918
+ interactive=True,
1919
+ )
1920
+
1921
+ extraction_crepe_hop_length = gr.Slider(
1922
+ minimum=1,
1923
+ maximum=512,
1924
+ step=1,
1925
+ label=i18n("crepe_hop_length"),
1926
+ value=128,
1927
+ interactive=True,
1928
+ visible=False,
1929
+ )
1930
+ f0method8.change(fn=whethercrepeornah, inputs=[f0method8], outputs=[extraction_crepe_hop_length])
1931
+ but2 = gr.Button("2. Pitch Extraction", variant="primary")
1932
+ info2 = gr.Textbox(label="Status(Check the Colab Notebook's cell output):", value="", max_lines=8)
1933
+ but2.click(
1934
+ extract_f0_feature,
1935
+ [gpus6, np7, f0method8, if_f0_3, exp_dir1, version19, extraction_crepe_hop_length],
1936
+ [info2],
1937
+ )
1938
+ with gr.Row():
1939
+ with gr.Column():
1940
+ total_epoch11 = gr.Slider(
1941
+ minimum=1,
1942
+ maximum=5000,
1943
+ step=10,
1944
+ label="Total # of training epochs (IF you choose a value too high, your model will sound horribly overtrained.):",
1945
+ value=250,
1946
+ interactive=True,
1947
+ )
1948
+ butstop = gr.Button(
1949
+ "Stop Training",
1950
+ variant='primary',
1951
+ visible=False,
1952
+ )
1953
+ but3 = gr.Button("3. Train Model", variant="primary", visible=False)
1954
+
1955
+ but3.click(fn=stoptraining, inputs=[gr.Number(value=0, visible=False)], outputs=[but3, butstop])
1956
+ butstop.click(fn=stoptraining, inputs=[gr.Number(value=1, visible=False)], outputs=[butstop, but3])
1957
+
1958
+
1959
+ but4 = gr.Button("4.Train Index", variant="primary")
1960
+ info3 = gr.Textbox(label="Status(Check the Colab Notebook's cell output):", value="", max_lines=10)
1961
+ with gr.Accordion("Training Preferences (You can leave these as they are)", open=False):
1962
+ #gr.Markdown(value=i18n("step3: 填写训练设置, 开始训练模型和索引"))
1963
+ with gr.Column():
1964
+ save_epoch10 = gr.Slider(
1965
+ minimum=1,
1966
+ maximum=200,
1967
+ step=1,
1968
+ label="Backup every X amount of epochs:",
1969
+ value=10,
1970
+ interactive=True,
1971
+ )
1972
+ batch_size12 = gr.Slider(
1973
+ minimum=1,
1974
+ maximum=40,
1975
+ step=1,
1976
+ label="Batch Size (LEAVE IT unless you know what you're doing!):",
1977
+ value=default_batch_size,
1978
+ interactive=True,
1979
+ )
1980
+ if_save_latest13 = gr.Checkbox(
1981
+ label="Save only the latest '.ckpt' file to save disk space.",
1982
+ value=True,
1983
+ interactive=True,
1984
+ )
1985
+ if_cache_gpu17 = gr.Checkbox(
1986
+ label="Cache all training sets to GPU memory. Caching small datasets (less than 10 minutes) can speed up training, but caching large datasets will consume a lot of GPU memory and may not provide much speed improvement.",
1987
+ value=False,
1988
+ interactive=True,
1989
+ )
1990
+ if_save_every_weights18 = gr.Checkbox(
1991
+ label="Save a small final model to the 'weights' folder at each save point.",
1992
+ value=True,
1993
+ interactive=True,
1994
+ )
1995
+ zip_model = gr.Button('5. Download Model')
1996
+ zipped_model = gr.Files(label='Your Model and Index file can be downloaded here:')
1997
+ zip_model.click(fn=zip_downloader, inputs=[exp_dir1], outputs=[zipped_model, info3])
1998
+ with gr.Group():
1999
+ with gr.Accordion("Base Model Locations:", open=False, visible=False):
2000
+ pretrained_G14 = gr.Textbox(
2001
+ label=i18n("加载预训练底模G路径"),
2002
+ value="pretrained_v2/f0G40k.pth",
2003
+ interactive=True,
2004
+ )
2005
+ pretrained_D15 = gr.Textbox(
2006
+ label=i18n("加载预训练底模D路径"),
2007
+ value="pretrained_v2/f0D40k.pth",
2008
+ interactive=True,
2009
+ )
2010
+ gpus16 = gr.Textbox(
2011
+ label=i18n("以-分隔输入使用的卡号, 例如 0-1-2 使用卡0和卡1和卡2"),
2012
+ value=gpus,
2013
+ interactive=True,
2014
+ )
2015
+ sr2.change(
2016
+ change_sr2,
2017
+ [sr2, if_f0_3, version19],
2018
+ [pretrained_G14, pretrained_D15, version19],
2019
+ )
2020
+ version19.change(
2021
+ change_version19,
2022
+ [sr2, if_f0_3, version19],
2023
+ [pretrained_G14, pretrained_D15],
2024
+ )
2025
+ if_f0_3.change(
2026
+ change_f0,
2027
+ [if_f0_3, sr2, version19],
2028
+ [f0method8, pretrained_G14, pretrained_D15],
2029
+ )
2030
+ but5 = gr.Button(i18n("一键训练"), variant="primary", visible=False)
2031
+ but3.click(
2032
+ click_train,
2033
+ [
2034
+ exp_dir1,
2035
+ sr2,
2036
+ if_f0_3,
2037
+ spk_id5,
2038
+ save_epoch10,
2039
+ total_epoch11,
2040
+ batch_size12,
2041
+ if_save_latest13,
2042
+ pretrained_G14,
2043
+ pretrained_D15,
2044
+ gpus16,
2045
+ if_cache_gpu17,
2046
+ if_save_every_weights18,
2047
+ version19,
2048
+ ],
2049
+ [
2050
+ info3,
2051
+ butstop,
2052
+ but3,
2053
+ ],
2054
+ )
2055
+ but4.click(train_index, [exp_dir1, version19], info3)
2056
+ but5.click(
2057
+ train1key,
2058
+ [
2059
+ exp_dir1,
2060
+ sr2,
2061
+ if_f0_3,
2062
+ trainset_dir4,
2063
+ spk_id5,
2064
+ np7,
2065
+ f0method8,
2066
+ save_epoch10,
2067
+ total_epoch11,
2068
+ batch_size12,
2069
+ if_save_latest13,
2070
+ pretrained_G14,
2071
+ pretrained_D15,
2072
+ gpus16,
2073
+ if_cache_gpu17,
2074
+ if_save_every_weights18,
2075
+ version19,
2076
+ extraction_crepe_hop_length
2077
+ ],
2078
+ info3,
2079
+ )
2080
+
2081
+ else:
2082
+ print(
2083
+ "Pretrained weights not downloaded. Disabling training tab.\n"
2084
+ "Wondering how to train a voice? Visit here for the RVC model training guide: https://t.ly/RVC_Training_Guide\n"
2085
+ "-------------------------------\n"
2086
+ )
2087
+
2088
+ app.queue(concurrency_count=511, max_size=1022).launch(share=False, quiet=True)
2089
+ #endregion
config.py ADDED
@@ -0,0 +1,204 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import argparse
2
+ import sys
3
+ import torch
4
+ import json
5
+ from multiprocessing import cpu_count
6
+
7
+ global usefp16
8
+ usefp16 = False
9
+
10
+
11
+ def use_fp32_config():
12
+ usefp16 = False
13
+ device_capability = 0
14
+ if torch.cuda.is_available():
15
+ device = torch.device("cuda:0") # Assuming you have only one GPU (index 0).
16
+ device_capability = torch.cuda.get_device_capability(device)[0]
17
+ if device_capability >= 7:
18
+ usefp16 = True
19
+ for config_file in ["32k.json", "40k.json", "48k.json"]:
20
+ with open(f"configs/{config_file}", "r") as d:
21
+ data = json.load(d)
22
+
23
+ if "train" in data and "fp16_run" in data["train"]:
24
+ data["train"]["fp16_run"] = True
25
+
26
+ with open(f"configs/{config_file}", "w") as d:
27
+ json.dump(data, d, indent=4)
28
+
29
+ print(f"Set fp16_run to true in {config_file}")
30
+
31
+ with open(
32
+ "trainset_preprocess_pipeline_print.py", "r", encoding="utf-8"
33
+ ) as f:
34
+ strr = f.read()
35
+
36
+ strr = strr.replace("3.0", "3.7")
37
+
38
+ with open(
39
+ "trainset_preprocess_pipeline_print.py", "w", encoding="utf-8"
40
+ ) as f:
41
+ f.write(strr)
42
+ else:
43
+ for config_file in ["32k.json", "40k.json", "48k.json"]:
44
+ with open(f"configs/{config_file}", "r") as f:
45
+ data = json.load(f)
46
+
47
+ if "train" in data and "fp16_run" in data["train"]:
48
+ data["train"]["fp16_run"] = False
49
+
50
+ with open(f"configs/{config_file}", "w") as d:
51
+ json.dump(data, d, indent=4)
52
+
53
+ print(f"Set fp16_run to false in {config_file}")
54
+
55
+ with open(
56
+ "trainset_preprocess_pipeline_print.py", "r", encoding="utf-8"
57
+ ) as f:
58
+ strr = f.read()
59
+
60
+ strr = strr.replace("3.7", "3.0")
61
+
62
+ with open(
63
+ "trainset_preprocess_pipeline_print.py", "w", encoding="utf-8"
64
+ ) as f:
65
+ f.write(strr)
66
+ else:
67
+ print(
68
+ "CUDA is not available. Make sure you have an NVIDIA GPU and CUDA installed."
69
+ )
70
+ return (usefp16, device_capability)
71
+
72
+
73
+ class Config:
74
+ def __init__(self):
75
+ self.device = "cuda:0"
76
+ self.is_half = True
77
+ self.n_cpu = 0
78
+ self.gpu_name = None
79
+ self.gpu_mem = None
80
+ (
81
+ self.python_cmd,
82
+ self.listen_port,
83
+ self.iscolab,
84
+ self.noparallel,
85
+ self.noautoopen,
86
+ self.paperspace,
87
+ self.is_cli,
88
+ ) = self.arg_parse()
89
+
90
+ self.x_pad, self.x_query, self.x_center, self.x_max = self.device_config()
91
+
92
+ @staticmethod
93
+ def arg_parse() -> tuple:
94
+ exe = sys.executable or "python"
95
+ parser = argparse.ArgumentParser()
96
+ parser.add_argument("--port", type=int, default=7865, help="Listen port")
97
+ parser.add_argument("--pycmd", type=str, default=exe, help="Python command")
98
+ parser.add_argument("--colab", action="store_true", help="Launch in colab")
99
+ parser.add_argument(
100
+ "--noparallel", action="store_true", help="Disable parallel processing"
101
+ )
102
+ parser.add_argument(
103
+ "--noautoopen",
104
+ action="store_true",
105
+ help="Do not open in browser automatically",
106
+ )
107
+ parser.add_argument( # Fork Feature. Paperspace integration for web UI
108
+ "--paperspace",
109
+ action="store_true",
110
+ help="Note that this argument just shares a gradio link for the web UI. Thus can be used on other non-local CLI systems.",
111
+ )
112
+ parser.add_argument( # Fork Feature. Embed a CLI into the infer-web.py
113
+ "--is_cli",
114
+ action="store_true",
115
+ help="Use the CLI instead of setting up a gradio UI. This flag will launch an RVC text interface where you can execute functions from infer-web.py!",
116
+ )
117
+ cmd_opts = parser.parse_args()
118
+
119
+ cmd_opts.port = cmd_opts.port if 0 <= cmd_opts.port <= 65535 else 7865
120
+
121
+ return (
122
+ cmd_opts.pycmd,
123
+ cmd_opts.port,
124
+ cmd_opts.colab,
125
+ cmd_opts.noparallel,
126
+ cmd_opts.noautoopen,
127
+ cmd_opts.paperspace,
128
+ cmd_opts.is_cli,
129
+ )
130
+
131
+ # has_mps is only available in nightly pytorch (for now) and MasOS 12.3+.
132
+ # check `getattr` and try it for compatibility
133
+ @staticmethod
134
+ def has_mps() -> bool:
135
+ if not torch.backends.mps.is_available():
136
+ return False
137
+ try:
138
+ torch.zeros(1).to(torch.device("mps"))
139
+ return True
140
+ except Exception:
141
+ return False
142
+
143
+ def device_config(self) -> tuple:
144
+ if torch.cuda.is_available():
145
+ i_device = int(self.device.split(":")[-1])
146
+ self.gpu_name = torch.cuda.get_device_name(i_device)
147
+ if (
148
+ ("16" in self.gpu_name and "V100" not in self.gpu_name.upper())
149
+ or "P40" in self.gpu_name.upper()
150
+ or "1060" in self.gpu_name
151
+ or "1070" in self.gpu_name
152
+ or "1080" in self.gpu_name
153
+ ):
154
+ print("Found GPU", self.gpu_name, ", force to fp32")
155
+ self.is_half = False
156
+ else:
157
+ print("Found GPU", self.gpu_name)
158
+ use_fp32_config()
159
+ self.gpu_mem = int(
160
+ torch.cuda.get_device_properties(i_device).total_memory
161
+ / 1024
162
+ / 1024
163
+ / 1024
164
+ + 0.4
165
+ )
166
+ if self.gpu_mem <= 4:
167
+ with open("trainset_preprocess_pipeline_print.py", "r") as f:
168
+ strr = f.read().replace("3.7", "3.0")
169
+ with open("trainset_preprocess_pipeline_print.py", "w") as f:
170
+ f.write(strr)
171
+ elif self.has_mps():
172
+ print("No supported Nvidia GPU found, use MPS instead")
173
+ self.device = "mps"
174
+ self.is_half = False
175
+ use_fp32_config()
176
+ else:
177
+ print("No supported Nvidia GPU found, use CPU instead")
178
+ self.device = "cpu"
179
+ self.is_half = False
180
+ use_fp32_config()
181
+
182
+ if self.n_cpu == 0:
183
+ self.n_cpu = cpu_count()
184
+
185
+ if self.is_half:
186
+ # 6G显存配置
187
+ x_pad = 3
188
+ x_query = 10
189
+ x_center = 60
190
+ x_max = 65
191
+ else:
192
+ # 5G显存配置
193
+ x_pad = 1
194
+ x_query = 6
195
+ x_center = 38
196
+ x_max = 41
197
+
198
+ if self.gpu_mem != None and self.gpu_mem <= 4:
199
+ x_pad = 1
200
+ x_query = 5
201
+ x_center = 30
202
+ x_max = 32
203
+
204
+ return x_pad, x_query, x_center, x_max
gitattributes.txt ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
gitignore.txt ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ __pycache__/
2
+ weights/
3
+ TEMP/
4
+ logs/
5
+ csvdb/
6
+
7
+ # Environment
8
+ venv/
9
+
10
+ # Models
11
+ hubert_base.pt
12
+ rmvpe.pt
i18n.py ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import locale
2
+ import json
3
+ import os
4
+
5
+
6
+ def load_language_list(language):
7
+ with open(f"./i18n/{language}.json", "r", encoding="utf-8") as f:
8
+ language_list = json.load(f)
9
+ return language_list
10
+
11
+
12
+ class I18nAuto:
13
+ def __init__(self, language=None):
14
+ if language in ["Auto", None]:
15
+ language = locale.getdefaultlocale()[
16
+ 0
17
+ ] # getlocale can't identify the system's language ((None, None))
18
+ if not os.path.exists(f"./i18n/{language}.json"):
19
+ language = "en_US"
20
+ self.language = language
21
+ # print("Use Language:", language)
22
+ self.language_map = load_language_list(language)
23
+
24
+ def __call__(self, key):
25
+ return self.language_map.get(key, key)
26
+
27
+ def print(self):
28
+ print("Use Language:", self.language)
packages.txt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ build-essential
2
+ ffmpeg
3
+ aria2
requirements.txt ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ gTTS
2
+ elevenlabs
3
+ stftpitchshift==1.5.1
4
+ torchcrepe
5
+ setuptools
6
+ wheel
7
+ httpx==0.23.0
8
+ faiss-gpu
9
+ fairseq
10
+ gradio==3.34.0
11
+ ffmpeg-python
12
+ praat-parselmouth
13
+ pyworld
14
+ numpy==1.23.5
15
+ i18n
16
+ numba==0.56.4
17
+ librosa==0.9.2
18
+ mega.py
19
+ gdown
20
+ onnxruntime
21
+ pyngrok==4.1.12
22
+ torch
rmvpe.py ADDED
@@ -0,0 +1,432 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import sys, torch, numpy as np, traceback, pdb
2
+ import torch.nn as nn
3
+ from time import time as ttime
4
+ import torch.nn.functional as F
5
+
6
+
7
+ class BiGRU(nn.Module):
8
+ def __init__(self, input_features, hidden_features, num_layers):
9
+ super(BiGRU, self).__init__()
10
+ self.gru = nn.GRU(
11
+ input_features,
12
+ hidden_features,
13
+ num_layers=num_layers,
14
+ batch_first=True,
15
+ bidirectional=True,
16
+ )
17
+
18
+ def forward(self, x):
19
+ return self.gru(x)[0]
20
+
21
+
22
+ class ConvBlockRes(nn.Module):
23
+ def __init__(self, in_channels, out_channels, momentum=0.01):
24
+ super(ConvBlockRes, self).__init__()
25
+ self.conv = nn.Sequential(
26
+ nn.Conv2d(
27
+ in_channels=in_channels,
28
+ out_channels=out_channels,
29
+ kernel_size=(3, 3),
30
+ stride=(1, 1),
31
+ padding=(1, 1),
32
+ bias=False,
33
+ ),
34
+ nn.BatchNorm2d(out_channels, momentum=momentum),
35
+ nn.ReLU(),
36
+ nn.Conv2d(
37
+ in_channels=out_channels,
38
+ out_channels=out_channels,
39
+ kernel_size=(3, 3),
40
+ stride=(1, 1),
41
+ padding=(1, 1),
42
+ bias=False,
43
+ ),
44
+ nn.BatchNorm2d(out_channels, momentum=momentum),
45
+ nn.ReLU(),
46
+ )
47
+ if in_channels != out_channels:
48
+ self.shortcut = nn.Conv2d(in_channels, out_channels, (1, 1))
49
+ self.is_shortcut = True
50
+ else:
51
+ self.is_shortcut = False
52
+
53
+ def forward(self, x):
54
+ if self.is_shortcut:
55
+ return self.conv(x) + self.shortcut(x)
56
+ else:
57
+ return self.conv(x) + x
58
+
59
+
60
+ class Encoder(nn.Module):
61
+ def __init__(
62
+ self,
63
+ in_channels,
64
+ in_size,
65
+ n_encoders,
66
+ kernel_size,
67
+ n_blocks,
68
+ out_channels=16,
69
+ momentum=0.01,
70
+ ):
71
+ super(Encoder, self).__init__()
72
+ self.n_encoders = n_encoders
73
+ self.bn = nn.BatchNorm2d(in_channels, momentum=momentum)
74
+ self.layers = nn.ModuleList()
75
+ self.latent_channels = []
76
+ for i in range(self.n_encoders):
77
+ self.layers.append(
78
+ ResEncoderBlock(
79
+ in_channels, out_channels, kernel_size, n_blocks, momentum=momentum
80
+ )
81
+ )
82
+ self.latent_channels.append([out_channels, in_size])
83
+ in_channels = out_channels
84
+ out_channels *= 2
85
+ in_size //= 2
86
+ self.out_size = in_size
87
+ self.out_channel = out_channels
88
+
89
+ def forward(self, x):
90
+ concat_tensors = []
91
+ x = self.bn(x)
92
+ for i in range(self.n_encoders):
93
+ _, x = self.layers[i](x)
94
+ concat_tensors.append(_)
95
+ return x, concat_tensors
96
+
97
+
98
+ class ResEncoderBlock(nn.Module):
99
+ def __init__(
100
+ self, in_channels, out_channels, kernel_size, n_blocks=1, momentum=0.01
101
+ ):
102
+ super(ResEncoderBlock, self).__init__()
103
+ self.n_blocks = n_blocks
104
+ self.conv = nn.ModuleList()
105
+ self.conv.append(ConvBlockRes(in_channels, out_channels, momentum))
106
+ for i in range(n_blocks - 1):
107
+ self.conv.append(ConvBlockRes(out_channels, out_channels, momentum))
108
+ self.kernel_size = kernel_size
109
+ if self.kernel_size is not None:
110
+ self.pool = nn.AvgPool2d(kernel_size=kernel_size)
111
+
112
+ def forward(self, x):
113
+ for i in range(self.n_blocks):
114
+ x = self.conv[i](x)
115
+ if self.kernel_size is not None:
116
+ return x, self.pool(x)
117
+ else:
118
+ return x
119
+
120
+
121
+ class Intermediate(nn.Module): #
122
+ def __init__(self, in_channels, out_channels, n_inters, n_blocks, momentum=0.01):
123
+ super(Intermediate, self).__init__()
124
+ self.n_inters = n_inters
125
+ self.layers = nn.ModuleList()
126
+ self.layers.append(
127
+ ResEncoderBlock(in_channels, out_channels, None, n_blocks, momentum)
128
+ )
129
+ for i in range(self.n_inters - 1):
130
+ self.layers.append(
131
+ ResEncoderBlock(out_channels, out_channels, None, n_blocks, momentum)
132
+ )
133
+
134
+ def forward(self, x):
135
+ for i in range(self.n_inters):
136
+ x = self.layers[i](x)
137
+ return x
138
+
139
+
140
+ class ResDecoderBlock(nn.Module):
141
+ def __init__(self, in_channels, out_channels, stride, n_blocks=1, momentum=0.01):
142
+ super(ResDecoderBlock, self).__init__()
143
+ out_padding = (0, 1) if stride == (1, 2) else (1, 1)
144
+ self.n_blocks = n_blocks
145
+ self.conv1 = nn.Sequential(
146
+ nn.ConvTranspose2d(
147
+ in_channels=in_channels,
148
+ out_channels=out_channels,
149
+ kernel_size=(3, 3),
150
+ stride=stride,
151
+ padding=(1, 1),
152
+ output_padding=out_padding,
153
+ bias=False,
154
+ ),
155
+ nn.BatchNorm2d(out_channels, momentum=momentum),
156
+ nn.ReLU(),
157
+ )
158
+ self.conv2 = nn.ModuleList()
159
+ self.conv2.append(ConvBlockRes(out_channels * 2, out_channels, momentum))
160
+ for i in range(n_blocks - 1):
161
+ self.conv2.append(ConvBlockRes(out_channels, out_channels, momentum))
162
+
163
+ def forward(self, x, concat_tensor):
164
+ x = self.conv1(x)
165
+ x = torch.cat((x, concat_tensor), dim=1)
166
+ for i in range(self.n_blocks):
167
+ x = self.conv2[i](x)
168
+ return x
169
+
170
+
171
+ class Decoder(nn.Module):
172
+ def __init__(self, in_channels, n_decoders, stride, n_blocks, momentum=0.01):
173
+ super(Decoder, self).__init__()
174
+ self.layers = nn.ModuleList()
175
+ self.n_decoders = n_decoders
176
+ for i in range(self.n_decoders):
177
+ out_channels = in_channels // 2
178
+ self.layers.append(
179
+ ResDecoderBlock(in_channels, out_channels, stride, n_blocks, momentum)
180
+ )
181
+ in_channels = out_channels
182
+
183
+ def forward(self, x, concat_tensors):
184
+ for i in range(self.n_decoders):
185
+ x = self.layers[i](x, concat_tensors[-1 - i])
186
+ return x
187
+
188
+
189
+ class DeepUnet(nn.Module):
190
+ def __init__(
191
+ self,
192
+ kernel_size,
193
+ n_blocks,
194
+ en_de_layers=5,
195
+ inter_layers=4,
196
+ in_channels=1,
197
+ en_out_channels=16,
198
+ ):
199
+ super(DeepUnet, self).__init__()
200
+ self.encoder = Encoder(
201
+ in_channels, 128, en_de_layers, kernel_size, n_blocks, en_out_channels
202
+ )
203
+ self.intermediate = Intermediate(
204
+ self.encoder.out_channel // 2,
205
+ self.encoder.out_channel,
206
+ inter_layers,
207
+ n_blocks,
208
+ )
209
+ self.decoder = Decoder(
210
+ self.encoder.out_channel, en_de_layers, kernel_size, n_blocks
211
+ )
212
+
213
+ def forward(self, x):
214
+ x, concat_tensors = self.encoder(x)
215
+ x = self.intermediate(x)
216
+ x = self.decoder(x, concat_tensors)
217
+ return x
218
+
219
+
220
+ class E2E(nn.Module):
221
+ def __init__(
222
+ self,
223
+ n_blocks,
224
+ n_gru,
225
+ kernel_size,
226
+ en_de_layers=5,
227
+ inter_layers=4,
228
+ in_channels=1,
229
+ en_out_channels=16,
230
+ ):
231
+ super(E2E, self).__init__()
232
+ self.unet = DeepUnet(
233
+ kernel_size,
234
+ n_blocks,
235
+ en_de_layers,
236
+ inter_layers,
237
+ in_channels,
238
+ en_out_channels,
239
+ )
240
+ self.cnn = nn.Conv2d(en_out_channels, 3, (3, 3), padding=(1, 1))
241
+ if n_gru:
242
+ self.fc = nn.Sequential(
243
+ BiGRU(3 * 128, 256, n_gru),
244
+ nn.Linear(512, 360),
245
+ nn.Dropout(0.25),
246
+ nn.Sigmoid(),
247
+ )
248
+ else:
249
+ self.fc = nn.Sequential(
250
+ nn.Linear(3 * N_MELS, N_CLASS), nn.Dropout(0.25), nn.Sigmoid()
251
+ )
252
+
253
+ def forward(self, mel):
254
+ mel = mel.transpose(-1, -2).unsqueeze(1)
255
+ x = self.cnn(self.unet(mel)).transpose(1, 2).flatten(-2)
256
+ x = self.fc(x)
257
+ return x
258
+
259
+
260
+ from librosa.filters import mel
261
+
262
+
263
+ class MelSpectrogram(torch.nn.Module):
264
+ def __init__(
265
+ self,
266
+ is_half,
267
+ n_mel_channels,
268
+ sampling_rate,
269
+ win_length,
270
+ hop_length,
271
+ n_fft=None,
272
+ mel_fmin=0,
273
+ mel_fmax=None,
274
+ clamp=1e-5,
275
+ ):
276
+ super().__init__()
277
+ n_fft = win_length if n_fft is None else n_fft
278
+ self.hann_window = {}
279
+ mel_basis = mel(
280
+ sr=sampling_rate,
281
+ n_fft=n_fft,
282
+ n_mels=n_mel_channels,
283
+ fmin=mel_fmin,
284
+ fmax=mel_fmax,
285
+ htk=True,
286
+ )
287
+ mel_basis = torch.from_numpy(mel_basis).float()
288
+ self.register_buffer("mel_basis", mel_basis)
289
+ self.n_fft = win_length if n_fft is None else n_fft
290
+ self.hop_length = hop_length
291
+ self.win_length = win_length
292
+ self.sampling_rate = sampling_rate
293
+ self.n_mel_channels = n_mel_channels
294
+ self.clamp = clamp
295
+ self.is_half = is_half
296
+
297
+ def forward(self, audio, keyshift=0, speed=1, center=True):
298
+ factor = 2 ** (keyshift / 12)
299
+ n_fft_new = int(np.round(self.n_fft * factor))
300
+ win_length_new = int(np.round(self.win_length * factor))
301
+ hop_length_new = int(np.round(self.hop_length * speed))
302
+ keyshift_key = str(keyshift) + "_" + str(audio.device)
303
+ if keyshift_key not in self.hann_window:
304
+ self.hann_window[keyshift_key] = torch.hann_window(win_length_new).to(
305
+ audio.device
306
+ )
307
+ fft = torch.stft(
308
+ audio,
309
+ n_fft=n_fft_new,
310
+ hop_length=hop_length_new,
311
+ win_length=win_length_new,
312
+ window=self.hann_window[keyshift_key],
313
+ center=center,
314
+ return_complex=True,
315
+ )
316
+ magnitude = torch.sqrt(fft.real.pow(2) + fft.imag.pow(2))
317
+ if keyshift != 0:
318
+ size = self.n_fft // 2 + 1
319
+ resize = magnitude.size(1)
320
+ if resize < size:
321
+ magnitude = F.pad(magnitude, (0, 0, 0, size - resize))
322
+ magnitude = magnitude[:, :size, :] * self.win_length / win_length_new
323
+ mel_output = torch.matmul(self.mel_basis, magnitude)
324
+ if self.is_half == True:
325
+ mel_output = mel_output.half()
326
+ log_mel_spec = torch.log(torch.clamp(mel_output, min=self.clamp))
327
+ return log_mel_spec
328
+
329
+
330
+ class RMVPE:
331
+ def __init__(self, model_path, is_half, device=None):
332
+ self.resample_kernel = {}
333
+ model = E2E(4, 1, (2, 2))
334
+ ckpt = torch.load(model_path, map_location="cpu")
335
+ model.load_state_dict(ckpt)
336
+ model.eval()
337
+ if is_half == True:
338
+ model = model.half()
339
+ self.model = model
340
+ self.resample_kernel = {}
341
+ self.is_half = is_half
342
+ if device is None:
343
+ device = "cuda" if torch.cuda.is_available() else "cpu"
344
+ self.device = device
345
+ self.mel_extractor = MelSpectrogram(
346
+ is_half, 128, 16000, 1024, 160, None, 30, 8000
347
+ ).to(device)
348
+ self.model = self.model.to(device)
349
+ cents_mapping = 20 * np.arange(360) + 1997.3794084376191
350
+ self.cents_mapping = np.pad(cents_mapping, (4, 4)) # 368
351
+
352
+ def mel2hidden(self, mel):
353
+ with torch.no_grad():
354
+ n_frames = mel.shape[-1]
355
+ mel = F.pad(
356
+ mel, (0, 32 * ((n_frames - 1) // 32 + 1) - n_frames), mode="reflect"
357
+ )
358
+ hidden = self.model(mel)
359
+ return hidden[:, :n_frames]
360
+
361
+ def decode(self, hidden, thred=0.03):
362
+ cents_pred = self.to_local_average_cents(hidden, thred=thred)
363
+ f0 = 10 * (2 ** (cents_pred / 1200))
364
+ f0[f0 == 10] = 0
365
+ # f0 = np.array([10 * (2 ** (cent_pred / 1200)) if cent_pred else 0 for cent_pred in cents_pred])
366
+ return f0
367
+
368
+ def infer_from_audio(self, audio, thred=0.03):
369
+ audio = torch.from_numpy(audio).float().to(self.device).unsqueeze(0)
370
+ # torch.cuda.synchronize()
371
+ # t0=ttime()
372
+ mel = self.mel_extractor(audio, center=True)
373
+ # torch.cuda.synchronize()
374
+ # t1=ttime()
375
+ hidden = self.mel2hidden(mel)
376
+ # torch.cuda.synchronize()
377
+ # t2=ttime()
378
+ hidden = hidden.squeeze(0).cpu().numpy()
379
+ if self.is_half == True:
380
+ hidden = hidden.astype("float32")
381
+ f0 = self.decode(hidden, thred=thred)
382
+ # torch.cuda.synchronize()
383
+ # t3=ttime()
384
+ # print("hmvpe:%s\t%s\t%s\t%s"%(t1-t0,t2-t1,t3-t2,t3-t0))
385
+ return f0
386
+
387
+ def to_local_average_cents(self, salience, thred=0.05):
388
+ # t0 = ttime()
389
+ center = np.argmax(salience, axis=1) # 帧长#index
390
+ salience = np.pad(salience, ((0, 0), (4, 4))) # 帧长,368
391
+ # t1 = ttime()
392
+ center += 4
393
+ todo_salience = []
394
+ todo_cents_mapping = []
395
+ starts = center - 4
396
+ ends = center + 5
397
+ for idx in range(salience.shape[0]):
398
+ todo_salience.append(salience[:, starts[idx] : ends[idx]][idx])
399
+ todo_cents_mapping.append(self.cents_mapping[starts[idx] : ends[idx]])
400
+ # t2 = ttime()
401
+ todo_salience = np.array(todo_salience) # 帧长,9
402
+ todo_cents_mapping = np.array(todo_cents_mapping) # 帧长,9
403
+ product_sum = np.sum(todo_salience * todo_cents_mapping, 1)
404
+ weight_sum = np.sum(todo_salience, 1) # 帧长
405
+ devided = product_sum / weight_sum # 帧长
406
+ # t3 = ttime()
407
+ maxx = np.max(salience, axis=1) # 帧长
408
+ devided[maxx <= thred] = 0
409
+ # t4 = ttime()
410
+ # print("decode:%s\t%s\t%s\t%s" % (t1 - t0, t2 - t1, t3 - t2, t4 - t3))
411
+ return devided
412
+
413
+
414
+ # if __name__ == '__main__':
415
+ # audio, sampling_rate = sf.read("卢本伟语录~1.wav")
416
+ # if len(audio.shape) > 1:
417
+ # audio = librosa.to_mono(audio.transpose(1, 0))
418
+ # audio_bak = audio.copy()
419
+ # if sampling_rate != 16000:
420
+ # audio = librosa.resample(audio, orig_sr=sampling_rate, target_sr=16000)
421
+ # model_path = "/bili-coeus/jupyter/jupyterhub-liujing04/vits_ch/test-RMVPE/weights/rmvpe_llc_half.pt"
422
+ # thred = 0.03 # 0.01
423
+ # device = 'cuda' if torch.cuda.is_available() else 'cpu'
424
+ # rmvpe = RMVPE(model_path,is_half=False, device=device)
425
+ # t0=ttime()
426
+ # f0 = rmvpe.infer_from_audio(audio, thred=thred)
427
+ # f0 = rmvpe.infer_from_audio(audio, thred=thred)
428
+ # f0 = rmvpe.infer_from_audio(audio, thred=thred)
429
+ # f0 = rmvpe.infer_from_audio(audio, thred=thred)
430
+ # f0 = rmvpe.infer_from_audio(audio, thred=thred)
431
+ # t1=ttime()
432
+ # print(f0.shape,t1-t0)
run.sh ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Install Debian packages
2
+ sudo apt-get update
3
+ sudo apt-get install -qq -y build-essential ffmpeg aria2
4
+
5
+ # Upgrade pip and setuptools
6
+ pip install --upgrade pip
7
+ pip install --upgrade setuptools
8
+
9
+ # Install wheel package (built-package format for Python)
10
+ pip install wheel
11
+
12
+ # Install Python packages using pip
13
+ pip install -r requirements.txt
14
+
15
+ # Run application locally at http://127.0.0.1:7860
16
+ python app.py
utils.py ADDED
@@ -0,0 +1,151 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import ffmpeg
2
+ import numpy as np
3
+
4
+ # import praatio
5
+ # import praatio.praat_scripts
6
+ import os
7
+ import sys
8
+
9
+ import random
10
+
11
+ import csv
12
+
13
+ platform_stft_mapping = {
14
+ "linux": "stftpitchshift",
15
+ "darwin": "stftpitchshift",
16
+ "win32": "stftpitchshift.exe",
17
+ }
18
+
19
+ stft = platform_stft_mapping.get(sys.platform)
20
+ # praatEXE = join('.',os.path.abspath(os.getcwd()) + r"\Praat.exe")
21
+
22
+
23
+ def CSVutil(file, rw, type, *args):
24
+ if type == "formanting":
25
+ if rw == "r":
26
+ with open(file) as fileCSVread:
27
+ csv_reader = list(csv.reader(fileCSVread))
28
+ return (
29
+ (csv_reader[0][0], csv_reader[0][1], csv_reader[0][2])
30
+ if csv_reader is not None
31
+ else (lambda: exec('raise ValueError("No data")'))()
32
+ )
33
+ else:
34
+ if args:
35
+ doformnt = args[0]
36
+ else:
37
+ doformnt = False
38
+ qfr = args[1] if len(args) > 1 else 1.0
39
+ tmb = args[2] if len(args) > 2 else 1.0
40
+ with open(file, rw, newline="") as fileCSVwrite:
41
+ csv_writer = csv.writer(fileCSVwrite, delimiter=",")
42
+ csv_writer.writerow([doformnt, qfr, tmb])
43
+ elif type == "stop":
44
+ stop = args[0] if args else False
45
+ with open(file, rw, newline="") as fileCSVwrite:
46
+ csv_writer = csv.writer(fileCSVwrite, delimiter=",")
47
+ csv_writer.writerow([stop])
48
+
49
+
50
+ def load_audio(file, sr, DoFormant, Quefrency, Timbre):
51
+ converted = False
52
+ DoFormant, Quefrency, Timbre = CSVutil("csvdb/formanting.csv", "r", "formanting")
53
+ try:
54
+ # https://github.com/openai/whisper/blob/main/whisper/audio.py#L26
55
+ # This launches a subprocess to decode audio while down-mixing and resampling as necessary.
56
+ # Requires the ffmpeg CLI and `ffmpeg-python` package to be installed.
57
+ file = (
58
+ file.strip(" ").strip('"').strip("\n").strip('"').strip(" ")
59
+ ) # 防止小白拷路径头尾带了空格和"和回车
60
+ file_formanted = file.strip(" ").strip('"').strip("\n").strip('"').strip(" ")
61
+
62
+ # print(f"dofor={bool(DoFormant)} timbr={Timbre} quef={Quefrency}\n")
63
+
64
+ if (
65
+ lambda DoFormant: True
66
+ if DoFormant.lower() == "true"
67
+ else (False if DoFormant.lower() == "false" else DoFormant)
68
+ )(DoFormant):
69
+ numerator = round(random.uniform(1, 4), 4)
70
+ # os.system(f"stftpitchshift -i {file} -q {Quefrency} -t {Timbre} -o {file_formanted}")
71
+ # print('stftpitchshift -i "%s" -p 1.0 --rms -w 128 -v 8 -q %s -t %s -o "%s"' % (file, Quefrency, Timbre, file_formanted))
72
+
73
+ if not file.endswith(".wav"):
74
+ if not os.path.isfile(f"{file_formanted}.wav"):
75
+ converted = True
76
+ # print(f"\nfile = {file}\n")
77
+ # print(f"\nfile_formanted = {file_formanted}\n")
78
+ converting = (
79
+ ffmpeg.input(file_formanted, threads=0)
80
+ .output(f"{file_formanted}.wav")
81
+ .run(
82
+ cmd=["ffmpeg", "-nostdin"],
83
+ capture_stdout=True,
84
+ capture_stderr=True,
85
+ )
86
+ )
87
+ else:
88
+ pass
89
+
90
+ file_formanted = (
91
+ f"{file_formanted}.wav"
92
+ if not file_formanted.endswith(".wav")
93
+ else file_formanted
94
+ )
95
+
96
+ print(f" · Formanting {file_formanted}...\n")
97
+
98
+ os.system(
99
+ '%s -i "%s" -q "%s" -t "%s" -o "%sFORMANTED_%s.wav"'
100
+ % (
101
+ stft,
102
+ file_formanted,
103
+ Quefrency,
104
+ Timbre,
105
+ file_formanted,
106
+ str(numerator),
107
+ )
108
+ )
109
+
110
+ print(f" · Formanted {file_formanted}!\n")
111
+
112
+ # filepraat = (os.path.abspath(os.getcwd()) + '\\' + file).replace('/','\\')
113
+ # file_formantedpraat = ('"' + os.path.abspath(os.getcwd()) + '/' + 'formanted'.join(file_formanted) + '"').replace('/','\\')
114
+ # print("%sFORMANTED_%s.wav" % (file_formanted, str(numerator)))
115
+
116
+ out, _ = (
117
+ ffmpeg.input(
118
+ "%sFORMANTED_%s.wav" % (file_formanted, str(numerator)), threads=0
119
+ )
120
+ .output("-", format="f32le", acodec="pcm_f32le", ac=1, ar=sr)
121
+ .run(
122
+ cmd=["ffmpeg", "-nostdin"], capture_stdout=True, capture_stderr=True
123
+ )
124
+ )
125
+
126
+ try:
127
+ os.remove("%sFORMANTED_%s.wav" % (file_formanted, str(numerator)))
128
+ except Exception:
129
+ pass
130
+ print("couldn't remove formanted type of file")
131
+
132
+ else:
133
+ out, _ = (
134
+ ffmpeg.input(file, threads=0)
135
+ .output("-", format="f32le", acodec="pcm_f32le", ac=1, ar=sr)
136
+ .run(
137
+ cmd=["ffmpeg", "-nostdin"], capture_stdout=True, capture_stderr=True
138
+ )
139
+ )
140
+ except Exception as e:
141
+ raise RuntimeError(f"Failed to load audio: {e}")
142
+
143
+ if converted:
144
+ try:
145
+ os.remove(file_formanted)
146
+ except Exception:
147
+ pass
148
+ print("couldn't remove converted type of file")
149
+ converted = False
150
+
151
+ return np.frombuffer(out, np.float32).flatten()
vc_infer_pipeline.py ADDED
@@ -0,0 +1,646 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import numpy as np, parselmouth, torch, pdb, sys, os
2
+ from time import time as ttime
3
+ import torch.nn.functional as F
4
+ import torchcrepe # Fork feature. Use the crepe f0 algorithm. New dependency (pip install torchcrepe)
5
+ from torch import Tensor
6
+ import scipy.signal as signal
7
+ import pyworld, os, traceback, faiss, librosa, torchcrepe
8
+ from scipy import signal
9
+ from functools import lru_cache
10
+
11
+ now_dir = os.getcwd()
12
+ sys.path.append(now_dir)
13
+
14
+ bh, ah = signal.butter(N=5, Wn=48, btype="high", fs=16000)
15
+
16
+ input_audio_path2wav = {}
17
+
18
+
19
+ @lru_cache
20
+ def cache_harvest_f0(input_audio_path, fs, f0max, f0min, frame_period):
21
+ audio = input_audio_path2wav[input_audio_path]
22
+ f0, t = pyworld.harvest(
23
+ audio,
24
+ fs=fs,
25
+ f0_ceil=f0max,
26
+ f0_floor=f0min,
27
+ frame_period=frame_period,
28
+ )
29
+ f0 = pyworld.stonemask(audio, f0, t, fs)
30
+ return f0
31
+
32
+
33
+ def change_rms(data1, sr1, data2, sr2, rate): # 1是输入音频,2是输出音频,rate是2的占比
34
+ # print(data1.max(),data2.max())
35
+ rms1 = librosa.feature.rms(
36
+ y=data1, frame_length=sr1 // 2 * 2, hop_length=sr1 // 2
37
+ ) # 每半秒一个点
38
+ rms2 = librosa.feature.rms(y=data2, frame_length=sr2 // 2 * 2, hop_length=sr2 // 2)
39
+ rms1 = torch.from_numpy(rms1)
40
+ rms1 = F.interpolate(
41
+ rms1.unsqueeze(0), size=data2.shape[0], mode="linear"
42
+ ).squeeze()
43
+ rms2 = torch.from_numpy(rms2)
44
+ rms2 = F.interpolate(
45
+ rms2.unsqueeze(0), size=data2.shape[0], mode="linear"
46
+ ).squeeze()
47
+ rms2 = torch.max(rms2, torch.zeros_like(rms2) + 1e-6)
48
+ data2 *= (
49
+ torch.pow(rms1, torch.tensor(1 - rate))
50
+ * torch.pow(rms2, torch.tensor(rate - 1))
51
+ ).numpy()
52
+ return data2
53
+
54
+
55
+ class VC(object):
56
+ def __init__(self, tgt_sr, config):
57
+ self.x_pad, self.x_query, self.x_center, self.x_max, self.is_half = (
58
+ config.x_pad,
59
+ config.x_query,
60
+ config.x_center,
61
+ config.x_max,
62
+ config.is_half,
63
+ )
64
+ self.sr = 16000 # hubert输入采样率
65
+ self.window = 160 # 每帧点数
66
+ self.t_pad = self.sr * self.x_pad # 每条前后pad时间
67
+ self.t_pad_tgt = tgt_sr * self.x_pad
68
+ self.t_pad2 = self.t_pad * 2
69
+ self.t_query = self.sr * self.x_query # 查询切点前后查询时间
70
+ self.t_center = self.sr * self.x_center # 查询切点位置
71
+ self.t_max = self.sr * self.x_max # 免查询时长阈值
72
+ self.device = config.device
73
+
74
+ # Fork Feature: Get the best torch device to use for f0 algorithms that require a torch device. Will return the type (torch.device)
75
+ def get_optimal_torch_device(self, index: int = 0) -> torch.device:
76
+ # Get cuda device
77
+ if torch.cuda.is_available():
78
+ return torch.device(
79
+ f"cuda:{index % torch.cuda.device_count()}"
80
+ ) # Very fast
81
+ elif torch.backends.mps.is_available():
82
+ return torch.device("mps")
83
+ # Insert an else here to grab "xla" devices if available. TO DO later. Requires the torch_xla.core.xla_model library
84
+ # Else wise return the "cpu" as a torch device,
85
+ return torch.device("cpu")
86
+
87
+ # Fork Feature: Compute f0 with the crepe method
88
+ def get_f0_crepe_computation(
89
+ self,
90
+ x,
91
+ f0_min,
92
+ f0_max,
93
+ p_len,
94
+ hop_length=160, # 512 before. Hop length changes the speed that the voice jumps to a different dramatic pitch. Lower hop lengths means more pitch accuracy but longer inference time.
95
+ model="full", # Either use crepe-tiny "tiny" or crepe "full". Default is full
96
+ ):
97
+ x = x.astype(
98
+ np.float32
99
+ ) # fixes the F.conv2D exception. We needed to convert double to float.
100
+ x /= np.quantile(np.abs(x), 0.999)
101
+ torch_device = self.get_optimal_torch_device()
102
+ audio = torch.from_numpy(x).to(torch_device, copy=True)
103
+ audio = torch.unsqueeze(audio, dim=0)
104
+ if audio.ndim == 2 and audio.shape[0] > 1:
105
+ audio = torch.mean(audio, dim=0, keepdim=True).detach()
106
+ audio = audio.detach()
107
+ print("Initiating prediction with a crepe_hop_length of: " + str(hop_length))
108
+ pitch: Tensor = torchcrepe.predict(
109
+ audio,
110
+ self.sr,
111
+ hop_length,
112
+ f0_min,
113
+ f0_max,
114
+ model,
115
+ batch_size=hop_length * 2,
116
+ device=torch_device,
117
+ pad=True,
118
+ )
119
+ p_len = p_len or x.shape[0] // hop_length
120
+ # Resize the pitch for final f0
121
+ source = np.array(pitch.squeeze(0).cpu().float().numpy())
122
+ source[source < 0.001] = np.nan
123
+ target = np.interp(
124
+ np.arange(0, len(source) * p_len, len(source)) / p_len,
125
+ np.arange(0, len(source)),
126
+ source,
127
+ )
128
+ f0 = np.nan_to_num(target)
129
+ return f0 # Resized f0
130
+
131
+ def get_f0_official_crepe_computation(
132
+ self,
133
+ x,
134
+ f0_min,
135
+ f0_max,
136
+ model="full",
137
+ ):
138
+ # Pick a batch size that doesn't cause memory errors on your gpu
139
+ batch_size = 512
140
+ # Compute pitch using first gpu
141
+ audio = torch.tensor(np.copy(x))[None].float()
142
+ f0, pd = torchcrepe.predict(
143
+ audio,
144
+ self.sr,
145
+ self.window,
146
+ f0_min,
147
+ f0_max,
148
+ model,
149
+ batch_size=batch_size,
150
+ device=self.device,
151
+ return_periodicity=True,
152
+ )
153
+ pd = torchcrepe.filter.median(pd, 3)
154
+ f0 = torchcrepe.filter.mean(f0, 3)
155
+ f0[pd < 0.1] = 0
156
+ f0 = f0[0].cpu().numpy()
157
+ return f0
158
+
159
+ # Fork Feature: Compute pYIN f0 method
160
+ def get_f0_pyin_computation(self, x, f0_min, f0_max):
161
+ y, sr = librosa.load("saudio/Sidney.wav", self.sr, mono=True)
162
+ f0, _, _ = librosa.pyin(y, sr=self.sr, fmin=f0_min, fmax=f0_max)
163
+ f0 = f0[1:] # Get rid of extra first frame
164
+ return f0
165
+
166
+ # Fork Feature: Acquire median hybrid f0 estimation calculation
167
+ def get_f0_hybrid_computation(
168
+ self,
169
+ methods_str,
170
+ input_audio_path,
171
+ x,
172
+ f0_min,
173
+ f0_max,
174
+ p_len,
175
+ filter_radius,
176
+ crepe_hop_length,
177
+ time_step,
178
+ ):
179
+ # Get various f0 methods from input to use in the computation stack
180
+ s = methods_str
181
+ s = s.split("hybrid")[1]
182
+ s = s.replace("[", "").replace("]", "")
183
+ methods = s.split("+")
184
+ f0_computation_stack = []
185
+
186
+ print("Calculating f0 pitch estimations for methods: %s" % str(methods))
187
+ x = x.astype(np.float32)
188
+ x /= np.quantile(np.abs(x), 0.999)
189
+ # Get f0 calculations for all methods specified
190
+ for method in methods:
191
+ f0 = None
192
+ if method == "pm":
193
+ f0 = (
194
+ parselmouth.Sound(x, self.sr)
195
+ .to_pitch_ac(
196
+ time_step=time_step / 1000,
197
+ voicing_threshold=0.6,
198
+ pitch_floor=f0_min,
199
+ pitch_ceiling=f0_max,
200
+ )
201
+ .selected_array["frequency"]
202
+ )
203
+ pad_size = (p_len - len(f0) + 1) // 2
204
+ if pad_size > 0 or p_len - len(f0) - pad_size > 0:
205
+ f0 = np.pad(
206
+ f0, [[pad_size, p_len - len(f0) - pad_size]], mode="constant"
207
+ )
208
+ elif method == "crepe":
209
+ f0 = self.get_f0_official_crepe_computation(x, f0_min, f0_max)
210
+ f0 = f0[1:] # Get rid of extra first frame
211
+ elif method == "crepe-tiny":
212
+ f0 = self.get_f0_official_crepe_computation(x, f0_min, f0_max, "tiny")
213
+ f0 = f0[1:] # Get rid of extra first frame
214
+ elif method == "mangio-crepe":
215
+ f0 = self.get_f0_crepe_computation(
216
+ x, f0_min, f0_max, p_len, crepe_hop_length
217
+ )
218
+ elif method == "mangio-crepe-tiny":
219
+ f0 = self.get_f0_crepe_computation(
220
+ x, f0_min, f0_max, p_len, crepe_hop_length, "tiny"
221
+ )
222
+ elif method == "harvest":
223
+ f0 = cache_harvest_f0(input_audio_path, self.sr, f0_max, f0_min, 10)
224
+ if filter_radius > 2:
225
+ f0 = signal.medfilt(f0, 3)
226
+ f0 = f0[1:] # Get rid of first frame.
227
+ elif method == "dio": # Potentially buggy?
228
+ f0, t = pyworld.dio(
229
+ x.astype(np.double),
230
+ fs=self.sr,
231
+ f0_ceil=f0_max,
232
+ f0_floor=f0_min,
233
+ frame_period=10,
234
+ )
235
+ f0 = pyworld.stonemask(x.astype(np.double), f0, t, self.sr)
236
+ f0 = signal.medfilt(f0, 3)
237
+ f0 = f0[1:]
238
+ # elif method == "pyin": Not Working just yet
239
+ # f0 = self.get_f0_pyin_computation(x, f0_min, f0_max)
240
+ # Push method to the stack
241
+ f0_computation_stack.append(f0)
242
+
243
+ for fc in f0_computation_stack:
244
+ print(len(fc))
245
+
246
+ print("Calculating hybrid median f0 from the stack of: %s" % str(methods))
247
+ f0_median_hybrid = None
248
+ if len(f0_computation_stack) == 1:
249
+ f0_median_hybrid = f0_computation_stack[0]
250
+ else:
251
+ f0_median_hybrid = np.nanmedian(f0_computation_stack, axis=0)
252
+ return f0_median_hybrid
253
+
254
+ def get_f0(
255
+ self,
256
+ input_audio_path,
257
+ x,
258
+ p_len,
259
+ f0_up_key,
260
+ f0_method,
261
+ filter_radius,
262
+ crepe_hop_length,
263
+ inp_f0=None,
264
+ ):
265
+ global input_audio_path2wav
266
+ time_step = self.window / self.sr * 1000
267
+ f0_min = 50
268
+ f0_max = 1100
269
+ f0_mel_min = 1127 * np.log(1 + f0_min / 700)
270
+ f0_mel_max = 1127 * np.log(1 + f0_max / 700)
271
+ if f0_method == "pm":
272
+ f0 = (
273
+ parselmouth.Sound(x, self.sr)
274
+ .to_pitch_ac(
275
+ time_step=time_step / 1000,
276
+ voicing_threshold=0.6,
277
+ pitch_floor=f0_min,
278
+ pitch_ceiling=f0_max,
279
+ )
280
+ .selected_array["frequency"]
281
+ )
282
+ pad_size = (p_len - len(f0) + 1) // 2
283
+ if pad_size > 0 or p_len - len(f0) - pad_size > 0:
284
+ f0 = np.pad(
285
+ f0, [[pad_size, p_len - len(f0) - pad_size]], mode="constant"
286
+ )
287
+ elif f0_method == "harvest":
288
+ input_audio_path2wav[input_audio_path] = x.astype(np.double)
289
+ f0 = cache_harvest_f0(input_audio_path, self.sr, f0_max, f0_min, 10)
290
+ if filter_radius > 2:
291
+ f0 = signal.medfilt(f0, 3)
292
+ elif f0_method == "dio": # Potentially Buggy?
293
+ f0, t = pyworld.dio(
294
+ x.astype(np.double),
295
+ fs=self.sr,
296
+ f0_ceil=f0_max,
297
+ f0_floor=f0_min,
298
+ frame_period=10,
299
+ )
300
+ f0 = pyworld.stonemask(x.astype(np.double), f0, t, self.sr)
301
+ f0 = signal.medfilt(f0, 3)
302
+ elif f0_method == "crepe":
303
+ f0 = self.get_f0_official_crepe_computation(x, f0_min, f0_max)
304
+ elif f0_method == "crepe-tiny":
305
+ f0 = self.get_f0_official_crepe_computation(x, f0_min, f0_max, "tiny")
306
+ elif f0_method == "mangio-crepe":
307
+ f0 = self.get_f0_crepe_computation(
308
+ x, f0_min, f0_max, p_len, crepe_hop_length
309
+ )
310
+ elif f0_method == "mangio-crepe-tiny":
311
+ f0 = self.get_f0_crepe_computation(
312
+ x, f0_min, f0_max, p_len, crepe_hop_length, "tiny"
313
+ )
314
+ elif f0_method == "rmvpe":
315
+ if hasattr(self, "model_rmvpe") == False:
316
+ from rmvpe import RMVPE
317
+
318
+ print("loading rmvpe model")
319
+ self.model_rmvpe = RMVPE(
320
+ "rmvpe.pt", is_half=self.is_half, device=self.device
321
+ )
322
+ f0 = self.model_rmvpe.infer_from_audio(x, thred=0.03)
323
+
324
+ elif "hybrid" in f0_method:
325
+ # Perform hybrid median pitch estimation
326
+ input_audio_path2wav[input_audio_path] = x.astype(np.double)
327
+ f0 = self.get_f0_hybrid_computation(
328
+ f0_method,
329
+ input_audio_path,
330
+ x,
331
+ f0_min,
332
+ f0_max,
333
+ p_len,
334
+ filter_radius,
335
+ crepe_hop_length,
336
+ time_step,
337
+ )
338
+
339
+ f0 *= pow(2, f0_up_key / 12)
340
+ # with open("test.txt","w")as f:f.write("\n".join([str(i)for i in f0.tolist()]))
341
+ tf0 = self.sr // self.window # 每秒f0点数
342
+ if inp_f0 is not None:
343
+ delta_t = np.round(
344
+ (inp_f0[:, 0].max() - inp_f0[:, 0].min()) * tf0 + 1
345
+ ).astype("int16")
346
+ replace_f0 = np.interp(
347
+ list(range(delta_t)), inp_f0[:, 0] * 100, inp_f0[:, 1]
348
+ )
349
+ shape = f0[self.x_pad * tf0 : self.x_pad * tf0 + len(replace_f0)].shape[0]
350
+ f0[self.x_pad * tf0 : self.x_pad * tf0 + len(replace_f0)] = replace_f0[
351
+ :shape
352
+ ]
353
+ # with open("test_opt.txt","w")as f:f.write("\n".join([str(i)for i in f0.tolist()]))
354
+ f0bak = f0.copy()
355
+ f0_mel = 1127 * np.log(1 + f0 / 700)
356
+ f0_mel[f0_mel > 0] = (f0_mel[f0_mel > 0] - f0_mel_min) * 254 / (
357
+ f0_mel_max - f0_mel_min
358
+ ) + 1
359
+ f0_mel[f0_mel <= 1] = 1
360
+ f0_mel[f0_mel > 255] = 255
361
+ f0_coarse = np.rint(f0_mel).astype(np.int)
362
+
363
+ return f0_coarse, f0bak # 1-0
364
+
365
+ def vc(
366
+ self,
367
+ model,
368
+ net_g,
369
+ sid,
370
+ audio0,
371
+ pitch,
372
+ pitchf,
373
+ times,
374
+ index,
375
+ big_npy,
376
+ index_rate,
377
+ version,
378
+ protect,
379
+ ): # ,file_index,file_big_npy
380
+ feats = torch.from_numpy(audio0)
381
+ if self.is_half:
382
+ feats = feats.half()
383
+ else:
384
+ feats = feats.float()
385
+ if feats.dim() == 2: # double channels
386
+ feats = feats.mean(-1)
387
+ assert feats.dim() == 1, feats.dim()
388
+ feats = feats.view(1, -1)
389
+ padding_mask = torch.BoolTensor(feats.shape).to(self.device).fill_(False)
390
+
391
+ inputs = {
392
+ "source": feats.to(self.device),
393
+ "padding_mask": padding_mask,
394
+ "output_layer": 9 if version == "v1" else 12,
395
+ }
396
+ t0 = ttime()
397
+ with torch.no_grad():
398
+ logits = model.extract_features(**inputs)
399
+ feats = model.final_proj(logits[0]) if version == "v1" else logits[0]
400
+ if protect < 0.5 and pitch != None and pitchf != None:
401
+ feats0 = feats.clone()
402
+ if (
403
+ isinstance(index, type(None)) == False
404
+ and isinstance(big_npy, type(None)) == False
405
+ and index_rate != 0
406
+ ):
407
+ npy = feats[0].cpu().numpy()
408
+ if self.is_half:
409
+ npy = npy.astype("float32")
410
+
411
+ # _, I = index.search(npy, 1)
412
+ # npy = big_npy[I.squeeze()]
413
+
414
+ score, ix = index.search(npy, k=8)
415
+ weight = np.square(1 / score)
416
+ weight /= weight.sum(axis=1, keepdims=True)
417
+ npy = np.sum(big_npy[ix] * np.expand_dims(weight, axis=2), axis=1)
418
+
419
+ if self.is_half:
420
+ npy = npy.astype("float16")
421
+ feats = (
422
+ torch.from_numpy(npy).unsqueeze(0).to(self.device) * index_rate
423
+ + (1 - index_rate) * feats
424
+ )
425
+
426
+ feats = F.interpolate(feats.permute(0, 2, 1), scale_factor=2).permute(0, 2, 1)
427
+ if protect < 0.5 and pitch != None and pitchf != None:
428
+ feats0 = F.interpolate(feats0.permute(0, 2, 1), scale_factor=2).permute(
429
+ 0, 2, 1
430
+ )
431
+ t1 = ttime()
432
+ p_len = audio0.shape[0] // self.window
433
+ if feats.shape[1] < p_len:
434
+ p_len = feats.shape[1]
435
+ if pitch != None and pitchf != None:
436
+ pitch = pitch[:, :p_len]
437
+ pitchf = pitchf[:, :p_len]
438
+
439
+ if protect < 0.5 and pitch != None and pitchf != None:
440
+ pitchff = pitchf.clone()
441
+ pitchff[pitchf > 0] = 1
442
+ pitchff[pitchf < 1] = protect
443
+ pitchff = pitchff.unsqueeze(-1)
444
+ feats = feats * pitchff + feats0 * (1 - pitchff)
445
+ feats = feats.to(feats0.dtype)
446
+ p_len = torch.tensor([p_len], device=self.device).long()
447
+ with torch.no_grad():
448
+ if pitch != None and pitchf != None:
449
+ audio1 = (
450
+ (net_g.infer(feats, p_len, pitch, pitchf, sid)[0][0, 0])
451
+ .data.cpu()
452
+ .float()
453
+ .numpy()
454
+ )
455
+ else:
456
+ audio1 = (
457
+ (net_g.infer(feats, p_len, sid)[0][0, 0]).data.cpu().float().numpy()
458
+ )
459
+ del feats, p_len, padding_mask
460
+ if torch.cuda.is_available():
461
+ torch.cuda.empty_cache()
462
+ t2 = ttime()
463
+ times[0] += t1 - t0
464
+ times[2] += t2 - t1
465
+ return audio1
466
+
467
+ def pipeline(
468
+ self,
469
+ model,
470
+ net_g,
471
+ sid,
472
+ audio,
473
+ input_audio_path,
474
+ times,
475
+ f0_up_key,
476
+ f0_method,
477
+ file_index,
478
+ # file_big_npy,
479
+ index_rate,
480
+ if_f0,
481
+ filter_radius,
482
+ tgt_sr,
483
+ resample_sr,
484
+ rms_mix_rate,
485
+ version,
486
+ protect,
487
+ crepe_hop_length,
488
+ f0_file=None,
489
+ ):
490
+ if (
491
+ file_index != ""
492
+ # and file_big_npy != ""
493
+ # and os.path.exists(file_big_npy) == True
494
+ and os.path.exists(file_index) == True
495
+ and index_rate != 0
496
+ ):
497
+ try:
498
+ index = faiss.read_index(file_index)
499
+ # big_npy = np.load(file_big_npy)
500
+ big_npy = index.reconstruct_n(0, index.ntotal)
501
+ except:
502
+ traceback.print_exc()
503
+ index = big_npy = None
504
+ else:
505
+ index = big_npy = None
506
+ audio = signal.filtfilt(bh, ah, audio)
507
+ audio_pad = np.pad(audio, (self.window // 2, self.window // 2), mode="reflect")
508
+ opt_ts = []
509
+ if audio_pad.shape[0] > self.t_max:
510
+ audio_sum = np.zeros_like(audio)
511
+ for i in range(self.window):
512
+ audio_sum += audio_pad[i : i - self.window]
513
+ for t in range(self.t_center, audio.shape[0], self.t_center):
514
+ opt_ts.append(
515
+ t
516
+ - self.t_query
517
+ + np.where(
518
+ np.abs(audio_sum[t - self.t_query : t + self.t_query])
519
+ == np.abs(audio_sum[t - self.t_query : t + self.t_query]).min()
520
+ )[0][0]
521
+ )
522
+ s = 0
523
+ audio_opt = []
524
+ t = None
525
+ t1 = ttime()
526
+ audio_pad = np.pad(audio, (self.t_pad, self.t_pad), mode="reflect")
527
+ p_len = audio_pad.shape[0] // self.window
528
+ inp_f0 = None
529
+ if hasattr(f0_file, "name") == True:
530
+ try:
531
+ with open(f0_file.name, "r") as f:
532
+ lines = f.read().strip("\n").split("\n")
533
+ inp_f0 = []
534
+ for line in lines:
535
+ inp_f0.append([float(i) for i in line.split(",")])
536
+ inp_f0 = np.array(inp_f0, dtype="float32")
537
+ except:
538
+ traceback.print_exc()
539
+ sid = torch.tensor(sid, device=self.device).unsqueeze(0).long()
540
+ pitch, pitchf = None, None
541
+ if if_f0 == 1:
542
+ pitch, pitchf = self.get_f0(
543
+ input_audio_path,
544
+ audio_pad,
545
+ p_len,
546
+ f0_up_key,
547
+ f0_method,
548
+ filter_radius,
549
+ crepe_hop_length,
550
+ inp_f0,
551
+ )
552
+ pitch = pitch[:p_len]
553
+ pitchf = pitchf[:p_len]
554
+ if self.device == "mps":
555
+ pitchf = pitchf.astype(np.float32)
556
+ pitch = torch.tensor(pitch, device=self.device).unsqueeze(0).long()
557
+ pitchf = torch.tensor(pitchf, device=self.device).unsqueeze(0).float()
558
+ t2 = ttime()
559
+ times[1] += t2 - t1
560
+ for t in opt_ts:
561
+ t = t // self.window * self.window
562
+ if if_f0 == 1:
563
+ audio_opt.append(
564
+ self.vc(
565
+ model,
566
+ net_g,
567
+ sid,
568
+ audio_pad[s : t + self.t_pad2 + self.window],
569
+ pitch[:, s // self.window : (t + self.t_pad2) // self.window],
570
+ pitchf[:, s // self.window : (t + self.t_pad2) // self.window],
571
+ times,
572
+ index,
573
+ big_npy,
574
+ index_rate,
575
+ version,
576
+ protect,
577
+ )[self.t_pad_tgt : -self.t_pad_tgt]
578
+ )
579
+ else:
580
+ audio_opt.append(
581
+ self.vc(
582
+ model,
583
+ net_g,
584
+ sid,
585
+ audio_pad[s : t + self.t_pad2 + self.window],
586
+ None,
587
+ None,
588
+ times,
589
+ index,
590
+ big_npy,
591
+ index_rate,
592
+ version,
593
+ protect,
594
+ )[self.t_pad_tgt : -self.t_pad_tgt]
595
+ )
596
+ s = t
597
+ if if_f0 == 1:
598
+ audio_opt.append(
599
+ self.vc(
600
+ model,
601
+ net_g,
602
+ sid,
603
+ audio_pad[t:],
604
+ pitch[:, t // self.window :] if t is not None else pitch,
605
+ pitchf[:, t // self.window :] if t is not None else pitchf,
606
+ times,
607
+ index,
608
+ big_npy,
609
+ index_rate,
610
+ version,
611
+ protect,
612
+ )[self.t_pad_tgt : -self.t_pad_tgt]
613
+ )
614
+ else:
615
+ audio_opt.append(
616
+ self.vc(
617
+ model,
618
+ net_g,
619
+ sid,
620
+ audio_pad[t:],
621
+ None,
622
+ None,
623
+ times,
624
+ index,
625
+ big_npy,
626
+ index_rate,
627
+ version,
628
+ protect,
629
+ )[self.t_pad_tgt : -self.t_pad_tgt]
630
+ )
631
+ audio_opt = np.concatenate(audio_opt)
632
+ if rms_mix_rate != 1:
633
+ audio_opt = change_rms(audio, 16000, audio_opt, tgt_sr, rms_mix_rate)
634
+ if resample_sr >= 16000 and tgt_sr != resample_sr:
635
+ audio_opt = librosa.resample(
636
+ audio_opt, orig_sr=tgt_sr, target_sr=resample_sr
637
+ )
638
+ audio_max = np.abs(audio_opt).max() / 0.99
639
+ max_int16 = 32768
640
+ if audio_max > 1:
641
+ max_int16 /= audio_max
642
+ audio_opt = (audio_opt * max_int16).astype(np.int16)
643
+ del pitch, pitchf, sid
644
+ if torch.cuda.is_available():
645
+ torch.cuda.empty_cache()
646
+ return audio_opt