lainlives commited on
Commit
0a0c4f3
·
1 Parent(s): 83bd5b3
Files changed (1) hide show
  1. config/Default Configuration.json +2461 -0
config/Default Configuration.json ADDED
@@ -0,0 +1,2461 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "song": {
3
+ "one_click": {
4
+ "embedder_model": {
5
+ "label": "Embedder model",
6
+ "info": "The model to use for generating speaker embeddings.",
7
+ "value": "contentvec",
8
+ "choices": [
9
+ "contentvec",
10
+ "chinese-hubert-base",
11
+ "japanese-hubert-base",
12
+ "korean-hubert-base",
13
+ "custom"
14
+ ],
15
+ "multiselect": null,
16
+ "allow_custom_value": false,
17
+ "type": "value",
18
+ "visible": true,
19
+ "scale": null,
20
+ "render": true,
21
+ "exclude_value": true
22
+ },
23
+ "custom_embedder_model": {
24
+ "label": "Custom embedder model",
25
+ "info": "Select a custom embedder model from the dropdown.",
26
+ "value": null,
27
+ "choices": null,
28
+ "multiselect": null,
29
+ "allow_custom_value": false,
30
+ "type": "value",
31
+ "visible": false,
32
+ "scale": null,
33
+ "render": false,
34
+ "exclude_value": true
35
+ },
36
+ "voice_model": {
37
+ "label": "Voice model",
38
+ "info": "Select a model to use for voice conversion.",
39
+ "value": null,
40
+ "choices": null,
41
+ "multiselect": null,
42
+ "allow_custom_value": false,
43
+ "type": "value",
44
+ "visible": true,
45
+ "scale": null,
46
+ "render": false,
47
+ "exclude_value": true
48
+ },
49
+ "f0_methods": {
50
+ "label": "Pitch extraction algorithm(s)",
51
+ "info": "If more than one method is selected, then the median of the pitch values extracted by each method is used. RMVPE is recommended for most cases and is the default when no method is selected.",
52
+ "value": [
53
+ "rmvpe",
54
+ "crepe",
55
+ "fcpe"
56
+ ],
57
+ "choices": [
58
+ "rmvpe",
59
+ "crepe",
60
+ "crepe-tiny",
61
+ "fcpe"
62
+ ],
63
+ "multiselect": true,
64
+ "allow_custom_value": false,
65
+ "type": "value",
66
+ "visible": true,
67
+ "scale": null,
68
+ "render": true,
69
+ "exclude_value": false
70
+ },
71
+ "index_rate": {
72
+ "label": "Index rate",
73
+ "info": "Increase to bias the conversion towards the accent of the voice model. Decrease to potentially reduce artifacts coming from the voice model.<br><br><br>",
74
+ "value": 0.2,
75
+ "minimum": 0.0,
76
+ "maximum": 1.0,
77
+ "step": null,
78
+ "visible": true,
79
+ "scale": null,
80
+ "render": true,
81
+ "exclude_value": false
82
+ },
83
+ "rms_mix_rate": {
84
+ "label": "RMS mix rate",
85
+ "info": "How much to mimic the loudness (0) of the input voice or a fixed loudness (1). A value of 1 is recommended for most cases.<br><br>",
86
+ "value": 1,
87
+ "minimum": 0.0,
88
+ "maximum": 1.0,
89
+ "step": null,
90
+ "visible": true,
91
+ "scale": null,
92
+ "render": true,
93
+ "exclude_value": false
94
+ },
95
+ "protect_rate": {
96
+ "label": "Protect rate",
97
+ "info": "Controls the extent to which consonants and breathing sounds are protected from artifacts. A higher value offers more protection but may worsen the indexing effect.<br><br>",
98
+ "value": 0.149,
99
+ "minimum": 0.0,
100
+ "maximum": 0.5,
101
+ "step": null,
102
+ "visible": true,
103
+ "scale": null,
104
+ "render": true,
105
+ "exclude_value": false
106
+ },
107
+ "hop_length": {
108
+ "label": "Hop length",
109
+ "info": "How often the CREPE-based pitch extraction method checks for pitch changes measured in milliseconds. Lower values lead to longer conversion times and a higher risk of voice cracks, but better pitch accuracy.",
110
+ "value": 128,
111
+ "minimum": 1.0,
112
+ "maximum": 512.0,
113
+ "step": 1.0,
114
+ "visible": true,
115
+ "scale": null,
116
+ "render": true,
117
+ "exclude_value": false
118
+ },
119
+ "split_voice": {
120
+ "label": "Split input voice",
121
+ "info": "Whether to split the input voice track into smaller segments before converting it. This can improve output quality for longer voice tracks.",
122
+ "value": false,
123
+ "visible": true,
124
+ "scale": null,
125
+ "render": true,
126
+ "exclude_value": false
127
+ },
128
+ "autotune_voice": {
129
+ "label": "Autotune converted voice",
130
+ "info": "Whether to apply autotune to the converted voice.<br><br>",
131
+ "value": false,
132
+ "visible": true,
133
+ "scale": null,
134
+ "render": true,
135
+ "exclude_value": true
136
+ },
137
+ "autotune_strength": {
138
+ "label": "Autotune intensity",
139
+ "info": "Higher values result in stronger snapping to the chromatic grid and artifacting.",
140
+ "value": 0.69,
141
+ "minimum": 0.0,
142
+ "maximum": 1.0,
143
+ "step": null,
144
+ "visible": false,
145
+ "scale": null,
146
+ "render": true,
147
+ "exclude_value": false
148
+ },
149
+ "sid": {
150
+ "label": "Speaker ID",
151
+ "info": "Speaker ID for multi-speaker-models.",
152
+ "value": 0,
153
+ "precision": 0,
154
+ "visible": true,
155
+ "scale": null,
156
+ "render": true,
157
+ "exclude_value": false
158
+ },
159
+ "output_sr": {
160
+ "label": "Output sample rate",
161
+ "info": "The sample rate of the mixed output track.",
162
+ "value": 44100,
163
+ "choices": [
164
+ 16000,
165
+ 44100,
166
+ 48000,
167
+ 96000,
168
+ 192000
169
+ ],
170
+ "multiselect": null,
171
+ "allow_custom_value": false,
172
+ "type": "value",
173
+ "visible": true,
174
+ "scale": null,
175
+ "render": true,
176
+ "exclude_value": false
177
+ },
178
+ "output_format": {
179
+ "label": "Output format",
180
+ "info": "The audio format of the mixed output track.",
181
+ "value": "mp3",
182
+ "choices": [
183
+ "mp3",
184
+ "wav",
185
+ "flac",
186
+ "ogg",
187
+ "m4a",
188
+ "aac"
189
+ ],
190
+ "multiselect": null,
191
+ "allow_custom_value": false,
192
+ "type": "value",
193
+ "visible": true,
194
+ "scale": null,
195
+ "render": true,
196
+ "exclude_value": false
197
+ },
198
+ "output_name": {
199
+ "label": "Output name",
200
+ "info": "If no name is provided, a suitable name will be generated automatically.",
201
+ "value": null,
202
+ "visible": true,
203
+ "scale": null,
204
+ "render": true,
205
+ "exclude_value": true,
206
+ "placeholder": "Ultimate RVC output"
207
+ },
208
+ "source_type": {
209
+ "label": "Source type",
210
+ "info": "The type of source to retrieve a song from.",
211
+ "value": "YouTube link/local path",
212
+ "choices": [
213
+ "YouTube link/local path",
214
+ "Local file",
215
+ "Cached song"
216
+ ],
217
+ "multiselect": null,
218
+ "allow_custom_value": false,
219
+ "type": "index",
220
+ "visible": true,
221
+ "scale": null,
222
+ "render": true,
223
+ "exclude_value": true
224
+ },
225
+ "source": {
226
+ "label": "Source",
227
+ "info": "Link to a song on YouTube or the full path of a local audio file.",
228
+ "value": null,
229
+ "visible": true,
230
+ "scale": null,
231
+ "render": true,
232
+ "exclude_value": true,
233
+ "placeholder": null
234
+ },
235
+ "cached_song": {
236
+ "label": "Source",
237
+ "info": "Select a song from the list of cached songs.",
238
+ "value": null,
239
+ "choices": null,
240
+ "multiselect": null,
241
+ "allow_custom_value": false,
242
+ "type": "value",
243
+ "visible": false,
244
+ "scale": null,
245
+ "render": false,
246
+ "exclude_value": true
247
+ },
248
+ "clean_voice": {
249
+ "label": "Clean converted voice",
250
+ "info": "Whether to clean the converted voice using noise reduction algorithms.<br><br>",
251
+ "value": false,
252
+ "visible": true,
253
+ "scale": null,
254
+ "render": true,
255
+ "exclude_value": true
256
+ },
257
+ "clean_strength": {
258
+ "label": "Cleaning intensity",
259
+ "info": "Higher values result in stronger cleaning, but may lead to a more compressed sound.",
260
+ "value": 0.7,
261
+ "minimum": 0.0,
262
+ "maximum": 1.0,
263
+ "step": 0.1,
264
+ "visible": false,
265
+ "scale": null,
266
+ "render": true,
267
+ "exclude_value": false
268
+ },
269
+ "room_size": {
270
+ "label": "Room size",
271
+ "info": "Size of the room which reverb effect simulates. Increase for longer reverb time.",
272
+ "value": 0.15,
273
+ "minimum": 0.0,
274
+ "maximum": 1.0,
275
+ "step": null,
276
+ "visible": true,
277
+ "scale": null,
278
+ "render": true,
279
+ "exclude_value": false
280
+ },
281
+ "wet_level": {
282
+ "label": "Wetness level",
283
+ "info": "Loudness of converted vocals with reverb effect applied.",
284
+ "value": 0.2,
285
+ "minimum": 0.0,
286
+ "maximum": 1.0,
287
+ "step": null,
288
+ "visible": true,
289
+ "scale": null,
290
+ "render": true,
291
+ "exclude_value": false
292
+ },
293
+ "dry_level": {
294
+ "label": "Dryness level",
295
+ "info": "Loudness of converted vocals without reverb effect applied.",
296
+ "value": 0.8,
297
+ "minimum": 0.0,
298
+ "maximum": 1.0,
299
+ "step": null,
300
+ "visible": true,
301
+ "scale": null,
302
+ "render": true,
303
+ "exclude_value": false
304
+ },
305
+ "damping": {
306
+ "label": "Damping level",
307
+ "info": "Absorption of high frequencies in reverb effect.",
308
+ "value": 0.7,
309
+ "minimum": 0.0,
310
+ "maximum": 1.0,
311
+ "step": null,
312
+ "visible": true,
313
+ "scale": null,
314
+ "render": true,
315
+ "exclude_value": false
316
+ },
317
+ "main_gain": {
318
+ "label": "Main gain",
319
+ "info": "The gain to apply to the main vocals.",
320
+ "value": 0,
321
+ "minimum": -20.0,
322
+ "maximum": 20.0,
323
+ "step": 1.0,
324
+ "visible": true,
325
+ "scale": null,
326
+ "render": true,
327
+ "exclude_value": false
328
+ },
329
+ "inst_gain": {
330
+ "label": "Instrumentals gain",
331
+ "info": "The gain to apply to the instrumentals.",
332
+ "value": 0,
333
+ "minimum": -20.0,
334
+ "maximum": 20.0,
335
+ "step": 1.0,
336
+ "visible": true,
337
+ "scale": null,
338
+ "render": true,
339
+ "exclude_value": false
340
+ },
341
+ "backup_gain": {
342
+ "label": "Backup gain",
343
+ "info": "The gain to apply to the backup vocals.",
344
+ "value": 0,
345
+ "minimum": -20.0,
346
+ "maximum": 20.0,
347
+ "step": 1.0,
348
+ "visible": true,
349
+ "scale": null,
350
+ "render": true,
351
+ "exclude_value": false
352
+ },
353
+ "n_octaves": {
354
+ "label": "Vocal pitch shift",
355
+ "info": "The number of octaves to shift the pitch of the converted vocals by. Use 1 for male-to-female and -1 for vice-versa.",
356
+ "value": 0,
357
+ "minimum": -3.0,
358
+ "maximum": 3.0,
359
+ "step": 1.0,
360
+ "visible": true,
361
+ "scale": null,
362
+ "render": true,
363
+ "exclude_value": false
364
+ },
365
+ "n_semitones": {
366
+ "label": "Overall pitch shift",
367
+ "info": "The number of semi-tones to shift the pitch of the converted vocals, instrumentals and backup vocals by.",
368
+ "value": 0,
369
+ "minimum": -12.0,
370
+ "maximum": 12.0,
371
+ "step": 1.0,
372
+ "visible": true,
373
+ "scale": null,
374
+ "render": true,
375
+ "exclude_value": false
376
+ },
377
+ "show_intermediate_audio": {
378
+ "label": "Show intermediate audio",
379
+ "info": "Show intermediate audio tracks produced during song cover generation.",
380
+ "value": false,
381
+ "visible": true,
382
+ "scale": null,
383
+ "render": true,
384
+ "exclude_value": true
385
+ },
386
+ "intermediate_audio": {
387
+ "song": {
388
+ "label": "Song",
389
+ "value": null,
390
+ "visible": true,
391
+ "scale": null,
392
+ "render": true,
393
+ "exclude_value": true,
394
+ "interactive": null
395
+ },
396
+ "vocals": {
397
+ "label": "Vocals",
398
+ "value": null,
399
+ "visible": true,
400
+ "scale": null,
401
+ "render": true,
402
+ "exclude_value": true,
403
+ "interactive": null
404
+ },
405
+ "instrumentals": {
406
+ "label": "Instrumentals",
407
+ "value": null,
408
+ "visible": true,
409
+ "scale": null,
410
+ "render": true,
411
+ "exclude_value": true,
412
+ "interactive": null
413
+ },
414
+ "main_vocals": {
415
+ "label": "Main vocals",
416
+ "value": null,
417
+ "visible": true,
418
+ "scale": null,
419
+ "render": true,
420
+ "exclude_value": true,
421
+ "interactive": null
422
+ },
423
+ "backup_vocals": {
424
+ "label": "Backup vocals",
425
+ "value": null,
426
+ "visible": true,
427
+ "scale": null,
428
+ "render": true,
429
+ "exclude_value": true,
430
+ "interactive": null
431
+ },
432
+ "main_vocals_dereverbed": {
433
+ "label": "De-reverbed main vocals",
434
+ "value": null,
435
+ "visible": true,
436
+ "scale": null,
437
+ "render": true,
438
+ "exclude_value": true,
439
+ "interactive": null
440
+ },
441
+ "main_vocals_reverb": {
442
+ "label": "Main vocals with reverb",
443
+ "value": null,
444
+ "visible": true,
445
+ "scale": null,
446
+ "render": true,
447
+ "exclude_value": true,
448
+ "interactive": null
449
+ },
450
+ "converted_vocals": {
451
+ "label": "Converted vocals",
452
+ "value": null,
453
+ "visible": true,
454
+ "scale": null,
455
+ "render": true,
456
+ "exclude_value": true,
457
+ "interactive": null
458
+ },
459
+ "postprocessed_vocals": {
460
+ "label": "Postprocessed vocals",
461
+ "value": null,
462
+ "visible": true,
463
+ "scale": null,
464
+ "render": true,
465
+ "exclude_value": true,
466
+ "interactive": null
467
+ },
468
+ "instrumentals_shifted": {
469
+ "label": "Pitch-shifted instrumentals",
470
+ "value": null,
471
+ "visible": true,
472
+ "scale": null,
473
+ "render": true,
474
+ "exclude_value": true,
475
+ "interactive": null
476
+ },
477
+ "backup_vocals_shifted": {
478
+ "label": "Pitch-shifted backup vocals",
479
+ "value": null,
480
+ "visible": true,
481
+ "scale": null,
482
+ "render": true,
483
+ "exclude_value": true,
484
+ "interactive": null
485
+ }
486
+ }
487
+ },
488
+ "multi_step": {
489
+ "embedder_model": {
490
+ "label": "Embedder model",
491
+ "info": "The model to use for generating speaker embeddings.",
492
+ "value": "contentvec",
493
+ "choices": [
494
+ "contentvec",
495
+ "chinese-hubert-base",
496
+ "japanese-hubert-base",
497
+ "korean-hubert-base",
498
+ "custom"
499
+ ],
500
+ "multiselect": null,
501
+ "allow_custom_value": false,
502
+ "type": "value",
503
+ "visible": true,
504
+ "scale": null,
505
+ "render": true,
506
+ "exclude_value": true
507
+ },
508
+ "custom_embedder_model": {
509
+ "label": "Custom embedder model",
510
+ "info": "Select a custom embedder model from the dropdown.",
511
+ "value": null,
512
+ "choices": null,
513
+ "multiselect": null,
514
+ "allow_custom_value": false,
515
+ "type": "value",
516
+ "visible": false,
517
+ "scale": null,
518
+ "render": false,
519
+ "exclude_value": true
520
+ },
521
+ "voice_model": {
522
+ "label": "Voice model",
523
+ "info": "Select a model to use for voice conversion.",
524
+ "value": null,
525
+ "choices": null,
526
+ "multiselect": null,
527
+ "allow_custom_value": false,
528
+ "type": "value",
529
+ "visible": true,
530
+ "scale": null,
531
+ "render": false,
532
+ "exclude_value": true
533
+ },
534
+ "f0_methods": {
535
+ "label": "Pitch extraction algorithm(s)",
536
+ "info": "If more than one method is selected, then the median of the pitch values extracted by each method is used. RMVPE is recommended for most cases and is the default when no method is selected.",
537
+ "value": [
538
+ "rmvpe"
539
+ ],
540
+ "choices": [
541
+ "rmvpe",
542
+ "crepe",
543
+ "crepe-tiny",
544
+ "fcpe"
545
+ ],
546
+ "multiselect": true,
547
+ "allow_custom_value": false,
548
+ "type": "value",
549
+ "visible": true,
550
+ "scale": null,
551
+ "render": true,
552
+ "exclude_value": false
553
+ },
554
+ "index_rate": {
555
+ "label": "Index rate",
556
+ "info": "Increase to bias the conversion towards the accent of the voice model. Decrease to potentially reduce artifacts coming from the voice model.<br><br><br>",
557
+ "value": 0.3,
558
+ "minimum": 0.0,
559
+ "maximum": 1.0,
560
+ "step": null,
561
+ "visible": true,
562
+ "scale": null,
563
+ "render": true,
564
+ "exclude_value": false
565
+ },
566
+ "rms_mix_rate": {
567
+ "label": "RMS mix rate",
568
+ "info": "How much to mimic the loudness (0) of the input voice or a fixed loudness (1). A value of 1 is recommended for most cases.<br><br>",
569
+ "value": 1,
570
+ "minimum": 0.0,
571
+ "maximum": 1.0,
572
+ "step": null,
573
+ "visible": true,
574
+ "scale": null,
575
+ "render": true,
576
+ "exclude_value": false
577
+ },
578
+ "protect_rate": {
579
+ "label": "Protect rate",
580
+ "info": "Controls the extent to which consonants and breathing sounds are protected from artifacts. A higher value offers more protection but may worsen the indexing effect.<br><br>",
581
+ "value": 0.33,
582
+ "minimum": 0.0,
583
+ "maximum": 0.5,
584
+ "step": null,
585
+ "visible": true,
586
+ "scale": null,
587
+ "render": true,
588
+ "exclude_value": false
589
+ },
590
+ "hop_length": {
591
+ "label": "Hop length",
592
+ "info": "How often the CREPE-based pitch extraction method checks for pitch changes measured in milliseconds. Lower values lead to longer conversion times and a higher risk of voice cracks, but better pitch accuracy.",
593
+ "value": 128,
594
+ "minimum": 1.0,
595
+ "maximum": 512.0,
596
+ "step": 1.0,
597
+ "visible": true,
598
+ "scale": null,
599
+ "render": true,
600
+ "exclude_value": false
601
+ },
602
+ "split_voice": {
603
+ "label": "Split input voice",
604
+ "info": "Whether to split the input voice track into smaller segments before converting it. This can improve output quality for longer voice tracks.",
605
+ "value": false,
606
+ "visible": true,
607
+ "scale": null,
608
+ "render": true,
609
+ "exclude_value": false
610
+ },
611
+ "autotune_voice": {
612
+ "label": "Autotune converted voice",
613
+ "info": "Whether to apply autotune to the converted voice.<br><br>",
614
+ "value": false,
615
+ "visible": true,
616
+ "scale": null,
617
+ "render": true,
618
+ "exclude_value": true
619
+ },
620
+ "autotune_strength": {
621
+ "label": "Autotune intensity",
622
+ "info": "Higher values result in stronger snapping to the chromatic grid and artifacting.",
623
+ "value": 1,
624
+ "minimum": 0.0,
625
+ "maximum": 1.0,
626
+ "step": null,
627
+ "visible": false,
628
+ "scale": null,
629
+ "render": true,
630
+ "exclude_value": false
631
+ },
632
+ "sid": {
633
+ "label": "Speaker ID",
634
+ "info": "Speaker ID for multi-speaker-models.",
635
+ "value": 0,
636
+ "precision": 0,
637
+ "visible": true,
638
+ "scale": null,
639
+ "render": true,
640
+ "exclude_value": false
641
+ },
642
+ "output_sr": {
643
+ "label": "Output sample rate",
644
+ "info": "The sample rate of the mixed output track.",
645
+ "value": 44100,
646
+ "choices": [
647
+ 16000,
648
+ 44100,
649
+ 48000,
650
+ 96000,
651
+ 192000
652
+ ],
653
+ "multiselect": null,
654
+ "allow_custom_value": false,
655
+ "type": "value",
656
+ "visible": true,
657
+ "scale": null,
658
+ "render": true,
659
+ "exclude_value": false
660
+ },
661
+ "output_format": {
662
+ "label": "Output format",
663
+ "info": "The audio format of the mixed output track.",
664
+ "value": "mp3",
665
+ "choices": [
666
+ "mp3",
667
+ "wav",
668
+ "flac",
669
+ "ogg",
670
+ "m4a",
671
+ "aac"
672
+ ],
673
+ "multiselect": null,
674
+ "allow_custom_value": false,
675
+ "type": "value",
676
+ "visible": true,
677
+ "scale": null,
678
+ "render": true,
679
+ "exclude_value": false
680
+ },
681
+ "output_name": {
682
+ "label": "Output name",
683
+ "info": "If no name is provided, a suitable name will be generated automatically.",
684
+ "value": null,
685
+ "visible": true,
686
+ "scale": null,
687
+ "render": true,
688
+ "exclude_value": true,
689
+ "placeholder": "Ultimate RVC output"
690
+ },
691
+ "source_type": {
692
+ "label": "Source type",
693
+ "info": "The type of source to retrieve a song from.",
694
+ "value": "YouTube link/local path",
695
+ "choices": [
696
+ "YouTube link/local path",
697
+ "Local file",
698
+ "Cached song"
699
+ ],
700
+ "multiselect": null,
701
+ "allow_custom_value": false,
702
+ "type": "index",
703
+ "visible": true,
704
+ "scale": null,
705
+ "render": true,
706
+ "exclude_value": true
707
+ },
708
+ "source": {
709
+ "label": "Source",
710
+ "info": "Link to a song on YouTube or the full path of a local audio file.",
711
+ "value": null,
712
+ "visible": true,
713
+ "scale": null,
714
+ "render": true,
715
+ "exclude_value": true,
716
+ "placeholder": null
717
+ },
718
+ "cached_song": {
719
+ "label": "Source",
720
+ "info": "Select a song from the list of cached songs.",
721
+ "value": null,
722
+ "choices": null,
723
+ "multiselect": null,
724
+ "allow_custom_value": false,
725
+ "type": "value",
726
+ "visible": false,
727
+ "scale": null,
728
+ "render": false,
729
+ "exclude_value": true
730
+ },
731
+ "clean_voice": {
732
+ "label": "Clean converted voice",
733
+ "info": "Whether to clean the converted voice using noise reduction algorithms.<br><br>",
734
+ "value": false,
735
+ "visible": true,
736
+ "scale": null,
737
+ "render": true,
738
+ "exclude_value": true
739
+ },
740
+ "clean_strength": {
741
+ "label": "Cleaning intensity",
742
+ "info": "Higher values result in stronger cleaning, but may lead to a more compressed sound.",
743
+ "value": 0.7,
744
+ "minimum": 0.0,
745
+ "maximum": 1.0,
746
+ "step": 0.1,
747
+ "visible": false,
748
+ "scale": null,
749
+ "render": true,
750
+ "exclude_value": false
751
+ },
752
+ "room_size": {
753
+ "label": "Room size",
754
+ "info": "Size of the room which reverb effect simulates. Increase for longer reverb time.",
755
+ "value": 0.15,
756
+ "minimum": 0.0,
757
+ "maximum": 1.0,
758
+ "step": null,
759
+ "visible": true,
760
+ "scale": null,
761
+ "render": true,
762
+ "exclude_value": false
763
+ },
764
+ "wet_level": {
765
+ "label": "Wetness level",
766
+ "info": "Loudness of converted vocals with reverb effect applied.",
767
+ "value": 0.2,
768
+ "minimum": 0.0,
769
+ "maximum": 1.0,
770
+ "step": null,
771
+ "visible": true,
772
+ "scale": null,
773
+ "render": true,
774
+ "exclude_value": false
775
+ },
776
+ "dry_level": {
777
+ "label": "Dryness level",
778
+ "info": "Loudness of converted vocals without reverb effect applied.",
779
+ "value": 0.8,
780
+ "minimum": 0.0,
781
+ "maximum": 1.0,
782
+ "step": null,
783
+ "visible": true,
784
+ "scale": null,
785
+ "render": true,
786
+ "exclude_value": false
787
+ },
788
+ "damping": {
789
+ "label": "Damping level",
790
+ "info": "Absorption of high frequencies in reverb effect.",
791
+ "value": 0.7,
792
+ "minimum": 0.0,
793
+ "maximum": 1.0,
794
+ "step": null,
795
+ "visible": true,
796
+ "scale": null,
797
+ "render": true,
798
+ "exclude_value": false
799
+ },
800
+ "main_gain": {
801
+ "label": "Main gain",
802
+ "info": "The gain to apply to the main vocals.",
803
+ "value": 0,
804
+ "minimum": -20.0,
805
+ "maximum": 20.0,
806
+ "step": 1.0,
807
+ "visible": true,
808
+ "scale": null,
809
+ "render": true,
810
+ "exclude_value": false
811
+ },
812
+ "inst_gain": {
813
+ "label": "Instrumentals gain",
814
+ "info": "The gain to apply to the instrumentals.",
815
+ "value": 0,
816
+ "minimum": -20.0,
817
+ "maximum": 20.0,
818
+ "step": 1.0,
819
+ "visible": true,
820
+ "scale": null,
821
+ "render": true,
822
+ "exclude_value": false
823
+ },
824
+ "backup_gain": {
825
+ "label": "Backup gain",
826
+ "info": "The gain to apply to the backup vocals.",
827
+ "value": 0,
828
+ "minimum": -20.0,
829
+ "maximum": 20.0,
830
+ "step": 1.0,
831
+ "visible": true,
832
+ "scale": null,
833
+ "render": true,
834
+ "exclude_value": false
835
+ },
836
+ "separation_model": {
837
+ "label": "Separation model",
838
+ "info": "The model to use for audio separation.",
839
+ "value": "UVR-MDX-NET-Voc_FT.onnx",
840
+ "choices": [
841
+ "UVR-MDX-NET-Voc_FT.onnx",
842
+ "UVR_MDXNET_KARA_2.onnx",
843
+ "Reverb_HQ_By_FoxJoy.onnx"
844
+ ],
845
+ "multiselect": null,
846
+ "allow_custom_value": false,
847
+ "type": "value",
848
+ "visible": true,
849
+ "scale": null,
850
+ "render": true,
851
+ "exclude_value": false
852
+ },
853
+ "segment_size": {
854
+ "label": "Segment size",
855
+ "info": "The size of the segments into which the audio is split. Using a larger size consumes more resources, but may give better results.",
856
+ "value": 512,
857
+ "choices": [
858
+ 64,
859
+ 128,
860
+ 256,
861
+ 512,
862
+ 1024,
863
+ 2048
864
+ ],
865
+ "visible": true,
866
+ "scale": null,
867
+ "render": true,
868
+ "exclude_value": false
869
+ },
870
+ "n_octaves": {
871
+ "label": "Pitch shift (octaves)",
872
+ "info": "The number of octaves to pitch-shift the converted voice by. Use 1 for male-to-female and -1 for vice-versa.",
873
+ "value": 0,
874
+ "minimum": -3.0,
875
+ "maximum": 3.0,
876
+ "step": 1.0,
877
+ "visible": true,
878
+ "scale": null,
879
+ "render": true,
880
+ "exclude_value": false
881
+ },
882
+ "n_semitones": {
883
+ "label": "Pitch shift (semi-tones)",
884
+ "info": "The number of semi-tones to pitch-shift the converted vocals by. Altering this slightly reduces sound quality.",
885
+ "value": 0,
886
+ "minimum": -12.0,
887
+ "maximum": 12.0,
888
+ "step": 1.0,
889
+ "visible": true,
890
+ "scale": null,
891
+ "render": true,
892
+ "exclude_value": false
893
+ },
894
+ "n_semitones_instrumentals": {
895
+ "label": "Instrumental pitch shift",
896
+ "info": "The number of semi-tones to pitch-shift the instrumentals by.",
897
+ "value": 0,
898
+ "minimum": -12.0,
899
+ "maximum": 12.0,
900
+ "step": 1.0,
901
+ "visible": true,
902
+ "scale": null,
903
+ "render": true,
904
+ "exclude_value": false
905
+ },
906
+ "n_semitones_backup_vocals": {
907
+ "label": "Backup vocal pitch shift",
908
+ "info": "The number of semi-tones to pitch-shift the backup vocals by.",
909
+ "value": 0,
910
+ "minimum": -12.0,
911
+ "maximum": 12.0,
912
+ "step": 1.0,
913
+ "visible": true,
914
+ "scale": null,
915
+ "render": true,
916
+ "exclude_value": false
917
+ },
918
+ "input_audio": {
919
+ "audio": {
920
+ "label": "Audio",
921
+ "value": null,
922
+ "visible": true,
923
+ "scale": null,
924
+ "render": false,
925
+ "exclude_value": true,
926
+ "interactive": null
927
+ },
928
+ "vocals": {
929
+ "label": "Vocals",
930
+ "value": null,
931
+ "visible": true,
932
+ "scale": null,
933
+ "render": false,
934
+ "exclude_value": true,
935
+ "interactive": null
936
+ },
937
+ "converted_vocals": {
938
+ "label": "Vocals",
939
+ "value": null,
940
+ "visible": true,
941
+ "scale": null,
942
+ "render": false,
943
+ "exclude_value": true,
944
+ "interactive": null
945
+ },
946
+ "instrumentals": {
947
+ "label": "Instrumentals",
948
+ "value": null,
949
+ "visible": true,
950
+ "scale": null,
951
+ "render": false,
952
+ "exclude_value": true,
953
+ "interactive": null
954
+ },
955
+ "backup_vocals": {
956
+ "label": "Backup vocals",
957
+ "value": null,
958
+ "visible": true,
959
+ "scale": null,
960
+ "render": false,
961
+ "exclude_value": true,
962
+ "interactive": null
963
+ },
964
+ "main_vocals": {
965
+ "label": "Main vocals",
966
+ "value": null,
967
+ "visible": true,
968
+ "scale": null,
969
+ "render": false,
970
+ "exclude_value": true,
971
+ "interactive": null
972
+ },
973
+ "shifted_instrumentals": {
974
+ "label": "Instrumentals",
975
+ "value": null,
976
+ "visible": true,
977
+ "scale": null,
978
+ "render": false,
979
+ "exclude_value": true,
980
+ "interactive": null
981
+ },
982
+ "shifted_backup_vocals": {
983
+ "label": "Backup vocals",
984
+ "value": null,
985
+ "visible": true,
986
+ "scale": null,
987
+ "render": false,
988
+ "exclude_value": true,
989
+ "interactive": null
990
+ }
991
+ },
992
+ "song_dirs": {
993
+ "separate_audio": {
994
+ "label": "Song directory",
995
+ "info": "Directory where intermediate audio files are stored and loaded from locally. When a new song is retrieved, its directory is chosen by default.",
996
+ "value": null,
997
+ "choices": null,
998
+ "multiselect": null,
999
+ "allow_custom_value": false,
1000
+ "type": "value",
1001
+ "visible": true,
1002
+ "scale": null,
1003
+ "render": false,
1004
+ "exclude_value": true
1005
+ },
1006
+ "convert_vocals": {
1007
+ "label": "Song directory",
1008
+ "info": "Directory where intermediate audio files are stored and loaded from locally. When a new song is retrieved, its directory is chosen by default.",
1009
+ "value": null,
1010
+ "choices": null,
1011
+ "multiselect": null,
1012
+ "allow_custom_value": false,
1013
+ "type": "value",
1014
+ "visible": true,
1015
+ "scale": null,
1016
+ "render": false,
1017
+ "exclude_value": true
1018
+ },
1019
+ "postprocess_vocals": {
1020
+ "label": "Song directory",
1021
+ "info": "Directory where intermediate audio files are stored and loaded from locally. When a new song is retrieved, its directory is chosen by default.",
1022
+ "value": null,
1023
+ "choices": null,
1024
+ "multiselect": null,
1025
+ "allow_custom_value": false,
1026
+ "type": "value",
1027
+ "visible": true,
1028
+ "scale": null,
1029
+ "render": false,
1030
+ "exclude_value": true
1031
+ },
1032
+ "pitch_shift_background": {
1033
+ "label": "Song directory",
1034
+ "info": "Directory where intermediate audio files are stored and loaded from locally. When a new song is retrieved, its directory is chosen by default.",
1035
+ "value": null,
1036
+ "choices": null,
1037
+ "multiselect": null,
1038
+ "allow_custom_value": false,
1039
+ "type": "value",
1040
+ "visible": true,
1041
+ "scale": null,
1042
+ "render": false,
1043
+ "exclude_value": true
1044
+ },
1045
+ "mix": {
1046
+ "label": "Song directory",
1047
+ "info": "Directory where intermediate audio files are stored and loaded from locally. When a new song is retrieved, its directory is chosen by default.",
1048
+ "value": null,
1049
+ "choices": null,
1050
+ "multiselect": null,
1051
+ "allow_custom_value": false,
1052
+ "type": "value",
1053
+ "visible": true,
1054
+ "scale": null,
1055
+ "render": false,
1056
+ "exclude_value": true
1057
+ }
1058
+ }
1059
+ }
1060
+ },
1061
+ "speech": {
1062
+ "one_click": {
1063
+ "embedder_model": {
1064
+ "label": "Embedder model",
1065
+ "info": "The model to use for generating speaker embeddings.",
1066
+ "value": "contentvec",
1067
+ "choices": [
1068
+ "contentvec",
1069
+ "chinese-hubert-base",
1070
+ "japanese-hubert-base",
1071
+ "korean-hubert-base",
1072
+ "custom"
1073
+ ],
1074
+ "multiselect": null,
1075
+ "allow_custom_value": false,
1076
+ "type": "value",
1077
+ "visible": true,
1078
+ "scale": null,
1079
+ "render": true,
1080
+ "exclude_value": true
1081
+ },
1082
+ "custom_embedder_model": {
1083
+ "label": "Custom embedder model",
1084
+ "info": "Select a custom embedder model from the dropdown.",
1085
+ "value": null,
1086
+ "choices": null,
1087
+ "multiselect": null,
1088
+ "allow_custom_value": false,
1089
+ "type": "value",
1090
+ "visible": false,
1091
+ "scale": null,
1092
+ "render": false,
1093
+ "exclude_value": true
1094
+ },
1095
+ "voice_model": {
1096
+ "label": "Voice model",
1097
+ "info": "Select a model to use for voice conversion.",
1098
+ "value": null,
1099
+ "choices": null,
1100
+ "multiselect": null,
1101
+ "allow_custom_value": false,
1102
+ "type": "value",
1103
+ "visible": true,
1104
+ "scale": null,
1105
+ "render": false,
1106
+ "exclude_value": true
1107
+ },
1108
+ "f0_methods": {
1109
+ "label": "Pitch extraction algorithm(s)",
1110
+ "info": "If more than one method is selected, then the median of the pitch values extracted by each method is used. RMVPE is recommended for most cases and is the default when no method is selected.",
1111
+ "value": [
1112
+ "rmvpe"
1113
+ ],
1114
+ "choices": [
1115
+ "rmvpe",
1116
+ "crepe",
1117
+ "crepe-tiny",
1118
+ "fcpe"
1119
+ ],
1120
+ "multiselect": true,
1121
+ "allow_custom_value": false,
1122
+ "type": "value",
1123
+ "visible": true,
1124
+ "scale": null,
1125
+ "render": true,
1126
+ "exclude_value": false
1127
+ },
1128
+ "index_rate": {
1129
+ "label": "Index rate",
1130
+ "info": "Increase to bias the conversion towards the accent of the voice model. Decrease to potentially reduce artifacts coming from the voice model.<br><br><br>",
1131
+ "value": 0.3,
1132
+ "minimum": 0.0,
1133
+ "maximum": 1.0,
1134
+ "step": null,
1135
+ "visible": true,
1136
+ "scale": null,
1137
+ "render": true,
1138
+ "exclude_value": false
1139
+ },
1140
+ "rms_mix_rate": {
1141
+ "label": "RMS mix rate",
1142
+ "info": "How much to mimic the loudness (0) of the input voice or a fixed loudness (1). A value of 1 is recommended for most cases.<br><br>",
1143
+ "value": 1,
1144
+ "minimum": 0.0,
1145
+ "maximum": 1.0,
1146
+ "step": null,
1147
+ "visible": true,
1148
+ "scale": null,
1149
+ "render": true,
1150
+ "exclude_value": false
1151
+ },
1152
+ "protect_rate": {
1153
+ "label": "Protect rate",
1154
+ "info": "Controls the extent to which consonants and breathing sounds are protected from artifacts. A higher value offers more protection but may worsen the indexing effect.<br><br>",
1155
+ "value": 0.33,
1156
+ "minimum": 0.0,
1157
+ "maximum": 0.5,
1158
+ "step": null,
1159
+ "visible": true,
1160
+ "scale": null,
1161
+ "render": true,
1162
+ "exclude_value": false
1163
+ },
1164
+ "hop_length": {
1165
+ "label": "Hop length",
1166
+ "info": "How often the CREPE-based pitch extraction method checks for pitch changes measured in milliseconds. Lower values lead to longer conversion times and a higher risk of voice cracks, but better pitch accuracy.",
1167
+ "value": 128,
1168
+ "minimum": 1.0,
1169
+ "maximum": 512.0,
1170
+ "step": 1.0,
1171
+ "visible": true,
1172
+ "scale": null,
1173
+ "render": true,
1174
+ "exclude_value": false
1175
+ },
1176
+ "split_voice": {
1177
+ "label": "Split input voice",
1178
+ "info": "Whether to split the input voice track into smaller segments before converting it. This can improve output quality for longer voice tracks.",
1179
+ "value": false,
1180
+ "visible": true,
1181
+ "scale": null,
1182
+ "render": true,
1183
+ "exclude_value": false
1184
+ },
1185
+ "autotune_voice": {
1186
+ "label": "Autotune converted voice",
1187
+ "info": "Whether to apply autotune to the converted voice.<br><br>",
1188
+ "value": false,
1189
+ "visible": true,
1190
+ "scale": null,
1191
+ "render": true,
1192
+ "exclude_value": true
1193
+ },
1194
+ "autotune_strength": {
1195
+ "label": "Autotune intensity",
1196
+ "info": "Higher values result in stronger snapping to the chromatic grid and artifacting.",
1197
+ "value": 1,
1198
+ "minimum": 0.0,
1199
+ "maximum": 1.0,
1200
+ "step": null,
1201
+ "visible": false,
1202
+ "scale": null,
1203
+ "render": true,
1204
+ "exclude_value": false
1205
+ },
1206
+ "sid": {
1207
+ "label": "Speaker ID",
1208
+ "info": "Speaker ID for multi-speaker-models.",
1209
+ "value": 0,
1210
+ "precision": 0,
1211
+ "visible": true,
1212
+ "scale": null,
1213
+ "render": true,
1214
+ "exclude_value": false
1215
+ },
1216
+ "output_sr": {
1217
+ "label": "Output sample rate",
1218
+ "info": "The sample rate of the mixed output track.",
1219
+ "value": 44100,
1220
+ "choices": [
1221
+ 16000,
1222
+ 44100,
1223
+ 48000,
1224
+ 96000,
1225
+ 192000
1226
+ ],
1227
+ "multiselect": null,
1228
+ "allow_custom_value": false,
1229
+ "type": "value",
1230
+ "visible": true,
1231
+ "scale": null,
1232
+ "render": true,
1233
+ "exclude_value": false
1234
+ },
1235
+ "output_format": {
1236
+ "label": "Output format",
1237
+ "info": "The audio format of the mixed output track.",
1238
+ "value": "mp3",
1239
+ "choices": [
1240
+ "mp3",
1241
+ "wav",
1242
+ "flac",
1243
+ "ogg",
1244
+ "m4a",
1245
+ "aac"
1246
+ ],
1247
+ "multiselect": null,
1248
+ "allow_custom_value": false,
1249
+ "type": "value",
1250
+ "visible": true,
1251
+ "scale": null,
1252
+ "render": true,
1253
+ "exclude_value": false
1254
+ },
1255
+ "output_name": {
1256
+ "label": "Output name",
1257
+ "info": "If no name is provided, a suitable name will be generated automatically.",
1258
+ "value": null,
1259
+ "visible": true,
1260
+ "scale": null,
1261
+ "render": true,
1262
+ "exclude_value": true,
1263
+ "placeholder": "Ultimate RVC output"
1264
+ },
1265
+ "source_type": {
1266
+ "label": "Source type",
1267
+ "info": "The type of source to generate speech from.",
1268
+ "value": "Text",
1269
+ "choices": [
1270
+ "Text",
1271
+ "Local file"
1272
+ ],
1273
+ "multiselect": null,
1274
+ "allow_custom_value": false,
1275
+ "type": "index",
1276
+ "visible": true,
1277
+ "scale": null,
1278
+ "render": true,
1279
+ "exclude_value": true
1280
+ },
1281
+ "source": {
1282
+ "label": "Source",
1283
+ "info": "Text to generate speech from",
1284
+ "value": null,
1285
+ "visible": true,
1286
+ "scale": null,
1287
+ "render": true,
1288
+ "exclude_value": true,
1289
+ "placeholder": null
1290
+ },
1291
+ "edge_tts_voice": {
1292
+ "label": "Edge TTS voice",
1293
+ "info": "Select a voice to use for text to speech conversion.",
1294
+ "value": null,
1295
+ "choices": null,
1296
+ "multiselect": null,
1297
+ "allow_custom_value": false,
1298
+ "type": "value",
1299
+ "visible": true,
1300
+ "scale": null,
1301
+ "render": false,
1302
+ "exclude_value": true
1303
+ },
1304
+ "n_octaves": {
1305
+ "label": "Octave shift",
1306
+ "info": "The number of octaves to pitch-shift the converted speech by. Use 1 for male-to-female and -1 for vice-versa.",
1307
+ "value": 0,
1308
+ "minimum": -3.0,
1309
+ "maximum": 3.0,
1310
+ "step": 1.0,
1311
+ "visible": true,
1312
+ "scale": null,
1313
+ "render": true,
1314
+ "exclude_value": false
1315
+ },
1316
+ "n_semitones": {
1317
+ "label": "Semitone shift",
1318
+ "info": "The number of semi-tones to pitch-shift the converted speech by.",
1319
+ "value": 0,
1320
+ "minimum": -12.0,
1321
+ "maximum": 12.0,
1322
+ "step": 1.0,
1323
+ "visible": true,
1324
+ "scale": null,
1325
+ "render": true,
1326
+ "exclude_value": false
1327
+ },
1328
+ "tts_pitch_shift": {
1329
+ "label": "Edge TTS pitch shift",
1330
+ "info": "The number of hertz to shift the pitch of the speech generated by Edge TTS.",
1331
+ "value": 0,
1332
+ "minimum": -100.0,
1333
+ "maximum": 100.0,
1334
+ "step": 1.0,
1335
+ "visible": true,
1336
+ "scale": null,
1337
+ "render": true,
1338
+ "exclude_value": false
1339
+ },
1340
+ "tts_speed_change": {
1341
+ "label": "TTS speed change",
1342
+ "info": "The percentual change to the speed of the speech generated by Edge TTS.",
1343
+ "value": 0,
1344
+ "minimum": -50.0,
1345
+ "maximum": 100.0,
1346
+ "step": 1.0,
1347
+ "visible": true,
1348
+ "scale": null,
1349
+ "render": true,
1350
+ "exclude_value": false
1351
+ },
1352
+ "tts_volume_change": {
1353
+ "label": "TTS volume change",
1354
+ "info": "The percentual change to the volume of the speech generated by Edge TTS.",
1355
+ "value": 0,
1356
+ "minimum": -100.0,
1357
+ "maximum": 100.0,
1358
+ "step": 1.0,
1359
+ "visible": true,
1360
+ "scale": null,
1361
+ "render": true,
1362
+ "exclude_value": false
1363
+ },
1364
+ "clean_voice": {
1365
+ "label": "Clean converted voice",
1366
+ "info": "Whether to clean the converted voice using noise reduction algorithms.<br><br>",
1367
+ "value": true,
1368
+ "visible": true,
1369
+ "scale": null,
1370
+ "render": true,
1371
+ "exclude_value": true
1372
+ },
1373
+ "clean_strength": {
1374
+ "label": "Cleaning intensity",
1375
+ "info": "Higher values result in stronger cleaning, but may lead to a more compressed sound.",
1376
+ "value": 0.7,
1377
+ "minimum": 0.0,
1378
+ "maximum": 1.0,
1379
+ "step": 0.1,
1380
+ "visible": true,
1381
+ "scale": null,
1382
+ "render": true,
1383
+ "exclude_value": false
1384
+ },
1385
+ "output_gain": {
1386
+ "label": "Output gain",
1387
+ "info": "The gain to apply to the converted speech.<br><br>",
1388
+ "value": 0,
1389
+ "minimum": -20.0,
1390
+ "maximum": 20.0,
1391
+ "step": 1.0,
1392
+ "visible": true,
1393
+ "scale": null,
1394
+ "render": true,
1395
+ "exclude_value": false
1396
+ },
1397
+ "intermediate_audio": {
1398
+ "speech": {
1399
+ "label": "Speech",
1400
+ "value": null,
1401
+ "visible": true,
1402
+ "scale": null,
1403
+ "render": true,
1404
+ "exclude_value": true,
1405
+ "interactive": null
1406
+ },
1407
+ "converted_speech": {
1408
+ "label": "Converted speech",
1409
+ "value": null,
1410
+ "visible": true,
1411
+ "scale": null,
1412
+ "render": true,
1413
+ "exclude_value": true,
1414
+ "interactive": null
1415
+ }
1416
+ },
1417
+ "show_intermediate_audio": {
1418
+ "label": "Show intermediate audio",
1419
+ "info": "Show intermediate audio tracks produced during speech generation.",
1420
+ "value": false,
1421
+ "visible": true,
1422
+ "scale": null,
1423
+ "render": true,
1424
+ "exclude_value": true
1425
+ }
1426
+ },
1427
+ "multi_step": {
1428
+ "embedder_model": {
1429
+ "label": "Embedder model",
1430
+ "info": "The model to use for generating speaker embeddings.",
1431
+ "value": "contentvec",
1432
+ "choices": [
1433
+ "contentvec",
1434
+ "chinese-hubert-base",
1435
+ "japanese-hubert-base",
1436
+ "korean-hubert-base",
1437
+ "custom"
1438
+ ],
1439
+ "multiselect": null,
1440
+ "allow_custom_value": false,
1441
+ "type": "value",
1442
+ "visible": true,
1443
+ "scale": null,
1444
+ "render": true,
1445
+ "exclude_value": true
1446
+ },
1447
+ "custom_embedder_model": {
1448
+ "label": "Custom embedder model",
1449
+ "info": "Select a custom embedder model from the dropdown.",
1450
+ "value": null,
1451
+ "choices": null,
1452
+ "multiselect": null,
1453
+ "allow_custom_value": false,
1454
+ "type": "value",
1455
+ "visible": false,
1456
+ "scale": null,
1457
+ "render": false,
1458
+ "exclude_value": true
1459
+ },
1460
+ "voice_model": {
1461
+ "label": "Voice model",
1462
+ "info": "Select a model to use for voice conversion.",
1463
+ "value": null,
1464
+ "choices": null,
1465
+ "multiselect": null,
1466
+ "allow_custom_value": false,
1467
+ "type": "value",
1468
+ "visible": true,
1469
+ "scale": null,
1470
+ "render": false,
1471
+ "exclude_value": true
1472
+ },
1473
+ "f0_methods": {
1474
+ "label": "Pitch extraction algorithm(s)",
1475
+ "info": "If more than one method is selected, then the median of the pitch values extracted by each method is used. RMVPE is recommended for most cases and is the default when no method is selected.",
1476
+ "value": [
1477
+ "rmvpe"
1478
+ ],
1479
+ "choices": [
1480
+ "rmvpe",
1481
+ "crepe",
1482
+ "crepe-tiny",
1483
+ "fcpe"
1484
+ ],
1485
+ "multiselect": true,
1486
+ "allow_custom_value": false,
1487
+ "type": "value",
1488
+ "visible": true,
1489
+ "scale": null,
1490
+ "render": true,
1491
+ "exclude_value": false
1492
+ },
1493
+ "index_rate": {
1494
+ "label": "Index rate",
1495
+ "info": "Increase to bias the conversion towards the accent of the voice model. Decrease to potentially reduce artifacts coming from the voice model.<br><br><br>",
1496
+ "value": 0.3,
1497
+ "minimum": 0.0,
1498
+ "maximum": 1.0,
1499
+ "step": null,
1500
+ "visible": true,
1501
+ "scale": null,
1502
+ "render": true,
1503
+ "exclude_value": false
1504
+ },
1505
+ "rms_mix_rate": {
1506
+ "label": "RMS mix rate",
1507
+ "info": "How much to mimic the loudness (0) of the input voice or a fixed loudness (1). A value of 1 is recommended for most cases.<br><br>",
1508
+ "value": 1,
1509
+ "minimum": 0.0,
1510
+ "maximum": 1.0,
1511
+ "step": null,
1512
+ "visible": true,
1513
+ "scale": null,
1514
+ "render": true,
1515
+ "exclude_value": false
1516
+ },
1517
+ "protect_rate": {
1518
+ "label": "Protect rate",
1519
+ "info": "Controls the extent to which consonants and breathing sounds are protected from artifacts. A higher value offers more protection but may worsen the indexing effect.<br><br>",
1520
+ "value": 0.33,
1521
+ "minimum": 0.0,
1522
+ "maximum": 0.5,
1523
+ "step": null,
1524
+ "visible": true,
1525
+ "scale": null,
1526
+ "render": true,
1527
+ "exclude_value": false
1528
+ },
1529
+ "hop_length": {
1530
+ "label": "Hop length",
1531
+ "info": "How often the CREPE-based pitch extraction method checks for pitch changes measured in milliseconds. Lower values lead to longer conversion times and a higher risk of voice cracks, but better pitch accuracy.",
1532
+ "value": 128,
1533
+ "minimum": 1.0,
1534
+ "maximum": 512.0,
1535
+ "step": 1.0,
1536
+ "visible": true,
1537
+ "scale": null,
1538
+ "render": true,
1539
+ "exclude_value": false
1540
+ },
1541
+ "split_voice": {
1542
+ "label": "Split input voice",
1543
+ "info": "Whether to split the input voice track into smaller segments before converting it. This can improve output quality for longer voice tracks.",
1544
+ "value": false,
1545
+ "visible": true,
1546
+ "scale": null,
1547
+ "render": true,
1548
+ "exclude_value": false
1549
+ },
1550
+ "autotune_voice": {
1551
+ "label": "Autotune converted voice",
1552
+ "info": "Whether to apply autotune to the converted voice.<br><br>",
1553
+ "value": false,
1554
+ "visible": true,
1555
+ "scale": null,
1556
+ "render": true,
1557
+ "exclude_value": true
1558
+ },
1559
+ "autotune_strength": {
1560
+ "label": "Autotune intensity",
1561
+ "info": "Higher values result in stronger snapping to the chromatic grid and artifacting.",
1562
+ "value": 1,
1563
+ "minimum": 0.0,
1564
+ "maximum": 1.0,
1565
+ "step": null,
1566
+ "visible": false,
1567
+ "scale": null,
1568
+ "render": true,
1569
+ "exclude_value": false
1570
+ },
1571
+ "sid": {
1572
+ "label": "Speaker ID",
1573
+ "info": "Speaker ID for multi-speaker-models.",
1574
+ "value": 0,
1575
+ "precision": 0,
1576
+ "visible": true,
1577
+ "scale": null,
1578
+ "render": true,
1579
+ "exclude_value": false
1580
+ },
1581
+ "output_sr": {
1582
+ "label": "Output sample rate",
1583
+ "info": "The sample rate of the mixed output track.",
1584
+ "value": 44100,
1585
+ "choices": [
1586
+ 16000,
1587
+ 44100,
1588
+ 48000,
1589
+ 96000,
1590
+ 192000
1591
+ ],
1592
+ "multiselect": null,
1593
+ "allow_custom_value": false,
1594
+ "type": "value",
1595
+ "visible": true,
1596
+ "scale": null,
1597
+ "render": true,
1598
+ "exclude_value": false
1599
+ },
1600
+ "output_format": {
1601
+ "label": "Output format",
1602
+ "info": "The audio format of the mixed output track.",
1603
+ "value": "mp3",
1604
+ "choices": [
1605
+ "mp3",
1606
+ "wav",
1607
+ "flac",
1608
+ "ogg",
1609
+ "m4a",
1610
+ "aac"
1611
+ ],
1612
+ "multiselect": null,
1613
+ "allow_custom_value": false,
1614
+ "type": "value",
1615
+ "visible": true,
1616
+ "scale": null,
1617
+ "render": true,
1618
+ "exclude_value": false
1619
+ },
1620
+ "output_name": {
1621
+ "label": "Output name",
1622
+ "info": "If no name is provided, a suitable name will be generated automatically.",
1623
+ "value": null,
1624
+ "visible": true,
1625
+ "scale": null,
1626
+ "render": true,
1627
+ "exclude_value": true,
1628
+ "placeholder": "Ultimate RVC output"
1629
+ },
1630
+ "source_type": {
1631
+ "label": "Source type",
1632
+ "info": "The type of source to generate speech from.",
1633
+ "value": "Text",
1634
+ "choices": [
1635
+ "Text",
1636
+ "Local file"
1637
+ ],
1638
+ "multiselect": null,
1639
+ "allow_custom_value": false,
1640
+ "type": "index",
1641
+ "visible": true,
1642
+ "scale": null,
1643
+ "render": true,
1644
+ "exclude_value": true
1645
+ },
1646
+ "source": {
1647
+ "label": "Source",
1648
+ "info": "Text to generate speech from",
1649
+ "value": null,
1650
+ "visible": true,
1651
+ "scale": null,
1652
+ "render": true,
1653
+ "exclude_value": true,
1654
+ "placeholder": null
1655
+ },
1656
+ "edge_tts_voice": {
1657
+ "label": "Edge TTS voice",
1658
+ "info": "Select a voice to use for text to speech conversion.",
1659
+ "value": null,
1660
+ "choices": null,
1661
+ "multiselect": null,
1662
+ "allow_custom_value": false,
1663
+ "type": "value",
1664
+ "visible": true,
1665
+ "scale": null,
1666
+ "render": false,
1667
+ "exclude_value": true
1668
+ },
1669
+ "n_octaves": {
1670
+ "label": "Octave shift",
1671
+ "info": "The number of octaves to pitch-shift the converted speech by. Use 1 for male-to-female and -1 for vice-versa.",
1672
+ "value": 0,
1673
+ "minimum": -3.0,
1674
+ "maximum": 3.0,
1675
+ "step": 1.0,
1676
+ "visible": true,
1677
+ "scale": null,
1678
+ "render": true,
1679
+ "exclude_value": false
1680
+ },
1681
+ "n_semitones": {
1682
+ "label": "Semitone shift",
1683
+ "info": "The number of semi-tones to pitch-shift the converted speech by.",
1684
+ "value": 0,
1685
+ "minimum": -12.0,
1686
+ "maximum": 12.0,
1687
+ "step": 1.0,
1688
+ "visible": true,
1689
+ "scale": null,
1690
+ "render": true,
1691
+ "exclude_value": false
1692
+ },
1693
+ "tts_pitch_shift": {
1694
+ "label": "Edge TTS pitch shift",
1695
+ "info": "The number of hertz to shift the pitch of the speech generated by Edge TTS.",
1696
+ "value": 0,
1697
+ "minimum": -100.0,
1698
+ "maximum": 100.0,
1699
+ "step": 1.0,
1700
+ "visible": true,
1701
+ "scale": null,
1702
+ "render": true,
1703
+ "exclude_value": false
1704
+ },
1705
+ "tts_speed_change": {
1706
+ "label": "TTS speed change",
1707
+ "info": "The percentual change to the speed of the speech generated by Edge TTS.",
1708
+ "value": 0,
1709
+ "minimum": -50.0,
1710
+ "maximum": 100.0,
1711
+ "step": 1.0,
1712
+ "visible": true,
1713
+ "scale": null,
1714
+ "render": true,
1715
+ "exclude_value": false
1716
+ },
1717
+ "tts_volume_change": {
1718
+ "label": "TTS volume change",
1719
+ "info": "The percentual change to the volume of the speech generated by Edge TTS.",
1720
+ "value": 0,
1721
+ "minimum": -100.0,
1722
+ "maximum": 100.0,
1723
+ "step": 1.0,
1724
+ "visible": true,
1725
+ "scale": null,
1726
+ "render": true,
1727
+ "exclude_value": false
1728
+ },
1729
+ "clean_voice": {
1730
+ "label": "Clean converted voice",
1731
+ "info": "Whether to clean the converted voice using noise reduction algorithms.<br><br>",
1732
+ "value": true,
1733
+ "visible": true,
1734
+ "scale": null,
1735
+ "render": true,
1736
+ "exclude_value": true
1737
+ },
1738
+ "clean_strength": {
1739
+ "label": "Cleaning intensity",
1740
+ "info": "Higher values result in stronger cleaning, but may lead to a more compressed sound.",
1741
+ "value": 0.7,
1742
+ "minimum": 0.0,
1743
+ "maximum": 1.0,
1744
+ "step": 0.1,
1745
+ "visible": true,
1746
+ "scale": null,
1747
+ "render": true,
1748
+ "exclude_value": false
1749
+ },
1750
+ "output_gain": {
1751
+ "label": "Output gain",
1752
+ "info": "The gain to apply to the converted speech.<br><br>",
1753
+ "value": 0,
1754
+ "minimum": -20.0,
1755
+ "maximum": 20.0,
1756
+ "step": 1.0,
1757
+ "visible": true,
1758
+ "scale": null,
1759
+ "render": true,
1760
+ "exclude_value": false
1761
+ },
1762
+ "input_audio": {
1763
+ "speech": {
1764
+ "label": "Speech",
1765
+ "value": null,
1766
+ "visible": true,
1767
+ "scale": null,
1768
+ "render": false,
1769
+ "exclude_value": true,
1770
+ "interactive": null
1771
+ },
1772
+ "converted_speech": {
1773
+ "label": "Converted speech",
1774
+ "value": null,
1775
+ "visible": true,
1776
+ "scale": null,
1777
+ "render": false,
1778
+ "exclude_value": true,
1779
+ "interactive": null
1780
+ }
1781
+ }
1782
+ }
1783
+ },
1784
+ "training": {
1785
+ "multi_step": {
1786
+ "embedder_model": {
1787
+ "label": "Embedder model",
1788
+ "info": "The model to use for generating speaker embeddings.",
1789
+ "value": "contentvec",
1790
+ "choices": [
1791
+ "contentvec",
1792
+ "chinese-hubert-base",
1793
+ "japanese-hubert-base",
1794
+ "korean-hubert-base",
1795
+ "custom"
1796
+ ],
1797
+ "multiselect": null,
1798
+ "allow_custom_value": false,
1799
+ "type": "value",
1800
+ "visible": true,
1801
+ "scale": null,
1802
+ "render": true,
1803
+ "exclude_value": true
1804
+ },
1805
+ "custom_embedder_model": {
1806
+ "label": "Custom embedder model",
1807
+ "info": "Select a custom embedder model from the dropdown.",
1808
+ "value": null,
1809
+ "choices": null,
1810
+ "multiselect": null,
1811
+ "allow_custom_value": false,
1812
+ "type": "value",
1813
+ "visible": false,
1814
+ "scale": null,
1815
+ "render": false,
1816
+ "exclude_value": true
1817
+ },
1818
+ "dataset_type": {
1819
+ "label": "Dataset type",
1820
+ "info": "Select the type of dataset to preprocess.",
1821
+ "value": "New dataset",
1822
+ "choices": [
1823
+ "New dataset",
1824
+ "Existing dataset"
1825
+ ],
1826
+ "multiselect": null,
1827
+ "allow_custom_value": false,
1828
+ "type": "value",
1829
+ "visible": true,
1830
+ "scale": null,
1831
+ "render": true,
1832
+ "exclude_value": true
1833
+ },
1834
+ "dataset": {
1835
+ "label": "Dataset path",
1836
+ "info": "The path to an existing dataset. Either select a path to a previously created dataset or provide a path to an external dataset.",
1837
+ "value": null,
1838
+ "choices": null,
1839
+ "multiselect": null,
1840
+ "allow_custom_value": true,
1841
+ "type": "value",
1842
+ "visible": false,
1843
+ "scale": null,
1844
+ "render": false,
1845
+ "exclude_value": true
1846
+ },
1847
+ "dataset_name": {
1848
+ "label": "Dataset name",
1849
+ "info": "The name of the new dataset. If the dataset already exists, the provided audio files will be added to it.",
1850
+ "value": "My dataset",
1851
+ "visible": true,
1852
+ "scale": null,
1853
+ "render": true,
1854
+ "exclude_value": true,
1855
+ "placeholder": null
1856
+ },
1857
+ "preprocess_model": {
1858
+ "label": "Model name",
1859
+ "info": "Name of the model to preprocess the given dataset for. Either select an existing model from the dropdown or provide the name of a new model.",
1860
+ "value": "My model",
1861
+ "choices": null,
1862
+ "multiselect": null,
1863
+ "allow_custom_value": true,
1864
+ "type": "value",
1865
+ "visible": true,
1866
+ "scale": null,
1867
+ "render": false,
1868
+ "exclude_value": true
1869
+ },
1870
+ "sample_rate": {
1871
+ "label": "Sample rate",
1872
+ "info": "Target sample rate for the audio files in the provided dataset.",
1873
+ "value": "40000",
1874
+ "choices": [
1875
+ "32000",
1876
+ "40000",
1877
+ "48000"
1878
+ ],
1879
+ "multiselect": null,
1880
+ "allow_custom_value": false,
1881
+ "type": "value",
1882
+ "visible": true,
1883
+ "scale": null,
1884
+ "render": true,
1885
+ "exclude_value": false
1886
+ },
1887
+ "filter_audio": {
1888
+ "label": "Filter audio",
1889
+ "info": "Whether to remove low-frequency sounds from the audio files in the provided dataset by applying a high-pass butterworth filter.<br><br>",
1890
+ "value": true,
1891
+ "visible": true,
1892
+ "scale": null,
1893
+ "render": true,
1894
+ "exclude_value": false
1895
+ },
1896
+ "clean_audio": {
1897
+ "label": "Clean audio",
1898
+ "info": "Whether to clean the audio files in the provided dataset using noise reduction algorithms.<br><br><br>",
1899
+ "value": false,
1900
+ "visible": true,
1901
+ "scale": null,
1902
+ "render": true,
1903
+ "exclude_value": true
1904
+ },
1905
+ "clean_strength": {
1906
+ "label": "Cleaning intensity",
1907
+ "info": "Higher values result in stronger cleaning, but may lead to a more compressed sound.",
1908
+ "value": 0.7,
1909
+ "minimum": 0.0,
1910
+ "maximum": 1.0,
1911
+ "step": 0.1,
1912
+ "visible": false,
1913
+ "scale": null,
1914
+ "render": true,
1915
+ "exclude_value": false
1916
+ },
1917
+ "split_method": {
1918
+ "label": "Audio splitting method",
1919
+ "info": "The method to use for splitting the audio files in the provided dataset. Use the `Skip` method to skip splitting if the audio files are already split. Use the `Simple` method if excessive silence has already been removed from the audio files. Use the `Automatic` method for automatic silence detection and splitting around it.",
1920
+ "value": "Automatic",
1921
+ "choices": [
1922
+ "Skip",
1923
+ "Simple",
1924
+ "Automatic"
1925
+ ],
1926
+ "multiselect": null,
1927
+ "allow_custom_value": false,
1928
+ "type": "value",
1929
+ "visible": true,
1930
+ "scale": null,
1931
+ "render": true,
1932
+ "exclude_value": true
1933
+ },
1934
+ "chunk_len": {
1935
+ "label": "Chunk length",
1936
+ "info": "Length of split audio chunks.",
1937
+ "value": 3,
1938
+ "minimum": 0.5,
1939
+ "maximum": 5.0,
1940
+ "step": 0.1,
1941
+ "visible": false,
1942
+ "scale": null,
1943
+ "render": true,
1944
+ "exclude_value": false
1945
+ },
1946
+ "overlap_len": {
1947
+ "label": "Overlap length",
1948
+ "info": "Length of overlap between split audio chunks.",
1949
+ "value": 0.3,
1950
+ "minimum": 0.0,
1951
+ "maximum": 0.4,
1952
+ "step": 0.1,
1953
+ "visible": false,
1954
+ "scale": null,
1955
+ "render": true,
1956
+ "exclude_value": false
1957
+ },
1958
+ "preprocess_cores": {
1959
+ "label": "CPU cores",
1960
+ "info": "The number of CPU cores to use for multi-threading.",
1961
+ "value": null,
1962
+ "minimum": 1.0,
1963
+ "maximum": 1.0,
1964
+ "step": 1.0,
1965
+ "visible": true,
1966
+ "scale": null,
1967
+ "render": true,
1968
+ "exclude_value": true
1969
+ },
1970
+ "extract_model": {
1971
+ "label": "Model name",
1972
+ "info": "Name of the model with an associated preprocessed dataset to extract training features from. When a new dataset is preprocessed, its associated model is selected by default.",
1973
+ "value": null,
1974
+ "choices": null,
1975
+ "multiselect": null,
1976
+ "allow_custom_value": false,
1977
+ "type": "value",
1978
+ "visible": true,
1979
+ "scale": null,
1980
+ "render": false,
1981
+ "exclude_value": true
1982
+ },
1983
+ "f0_method": {
1984
+ "label": "F0 method",
1985
+ "info": "The method to use for extracting pitch features.",
1986
+ "value": "rmvpe",
1987
+ "choices": [
1988
+ "rmvpe",
1989
+ "crepe",
1990
+ "crepe-tiny"
1991
+ ],
1992
+ "multiselect": null,
1993
+ "allow_custom_value": false,
1994
+ "type": "value",
1995
+ "visible": true,
1996
+ "scale": null,
1997
+ "render": true,
1998
+ "exclude_value": true
1999
+ },
2000
+ "hop_length": {
2001
+ "label": "Hop length",
2002
+ "info": "The hop length to use for extracting pitch features.<br><br>",
2003
+ "value": 128,
2004
+ "minimum": 1.0,
2005
+ "maximum": 512.0,
2006
+ "step": 1.0,
2007
+ "visible": false,
2008
+ "scale": null,
2009
+ "render": true,
2010
+ "exclude_value": false
2011
+ },
2012
+ "include_mutes": {
2013
+ "label": "Include mutes",
2014
+ "info": "The number of mute audio files to include in the generated training file list. Adding silent files enables the training model to handle pure silence in inferred audio files. If the preprocessed audio dataset already contains segments of pure silence, set this to 0.",
2015
+ "value": 2,
2016
+ "minimum": 0.0,
2017
+ "maximum": 10.0,
2018
+ "step": 1.0,
2019
+ "visible": true,
2020
+ "scale": null,
2021
+ "render": true,
2022
+ "exclude_value": false
2023
+ },
2024
+ "extraction_cores": {
2025
+ "label": "CPU cores",
2026
+ "info": "The number of CPU cores to use for multi-threading.",
2027
+ "value": null,
2028
+ "minimum": 1.0,
2029
+ "maximum": 1.0,
2030
+ "step": 1.0,
2031
+ "visible": true,
2032
+ "scale": null,
2033
+ "render": true,
2034
+ "exclude_value": true
2035
+ },
2036
+ "extraction_acceleration": {
2037
+ "label": "Hardware acceleration",
2038
+ "info": "The type of hardware acceleration to use. 'Automatic' will automatically select the first available GPU and fall back to CPU if no GPUs are available.",
2039
+ "value": "Automatic",
2040
+ "choices": [
2041
+ "Automatic",
2042
+ "CPU",
2043
+ "GPU"
2044
+ ],
2045
+ "multiselect": null,
2046
+ "allow_custom_value": false,
2047
+ "type": "value",
2048
+ "visible": true,
2049
+ "scale": null,
2050
+ "render": true,
2051
+ "exclude_value": true
2052
+ },
2053
+ "extraction_gpus": {
2054
+ "label": "GPU(s)",
2055
+ "info": "The GPU(s) to use for hardware acceleration.",
2056
+ "value": null,
2057
+ "choices": null,
2058
+ "multiselect": true,
2059
+ "allow_custom_value": false,
2060
+ "type": "value",
2061
+ "visible": false,
2062
+ "scale": null,
2063
+ "render": true,
2064
+ "exclude_value": true
2065
+ },
2066
+ "train_model": {
2067
+ "label": "Model name",
2068
+ "info": "Name of the model to train. When training features are extracted for a new model, its name is selected by default.",
2069
+ "value": null,
2070
+ "choices": null,
2071
+ "multiselect": null,
2072
+ "allow_custom_value": false,
2073
+ "type": "value",
2074
+ "visible": true,
2075
+ "scale": null,
2076
+ "render": false,
2077
+ "exclude_value": true
2078
+ },
2079
+ "num_epochs": {
2080
+ "label": "Number of epochs",
2081
+ "info": "The number of epochs to train the voice model. A higher number can improve voice model performance but may lead to overtraining.",
2082
+ "value": 500,
2083
+ "minimum": 1.0,
2084
+ "maximum": 1000.0,
2085
+ "step": 1.0,
2086
+ "visible": true,
2087
+ "scale": null,
2088
+ "render": true,
2089
+ "exclude_value": false
2090
+ },
2091
+ "batch_size": {
2092
+ "label": "Batch size",
2093
+ "info": "The number of samples in each training batch. It is advisable to align this value with the available VRAM of your GPU.",
2094
+ "value": 8,
2095
+ "minimum": 1.0,
2096
+ "maximum": 64.0,
2097
+ "step": 1.0,
2098
+ "visible": true,
2099
+ "scale": null,
2100
+ "render": true,
2101
+ "exclude_value": false
2102
+ },
2103
+ "detect_overtraining": {
2104
+ "label": "Detect overtraining",
2105
+ "info": "Whether to detect overtraining to prevent the voice model from learning the training data too well and losing the ability to generalize to new data.",
2106
+ "value": false,
2107
+ "visible": true,
2108
+ "scale": null,
2109
+ "render": true,
2110
+ "exclude_value": true
2111
+ },
2112
+ "overtraining_threshold": {
2113
+ "label": "Overtraining threshold",
2114
+ "info": "The maximum number of epochs to continue training without any observed improvement in voice model performance.",
2115
+ "value": 50,
2116
+ "minimum": 1.0,
2117
+ "maximum": 100.0,
2118
+ "step": null,
2119
+ "visible": false,
2120
+ "scale": null,
2121
+ "render": true,
2122
+ "exclude_value": false
2123
+ },
2124
+ "vocoder": {
2125
+ "label": "Vocoder",
2126
+ "info": "The vocoder to use for audio synthesis during training. HiFi-GAN provides basic audio fidelity, while RefineGAN provides the highest audio fidelity.",
2127
+ "value": "HiFi-GAN",
2128
+ "choices": [
2129
+ "HiFi-GAN",
2130
+ "MRF HiFi-GAN",
2131
+ "RefineGAN"
2132
+ ],
2133
+ "multiselect": null,
2134
+ "allow_custom_value": false,
2135
+ "type": "value",
2136
+ "visible": true,
2137
+ "scale": null,
2138
+ "render": true,
2139
+ "exclude_value": false
2140
+ },
2141
+ "index_algorithm": {
2142
+ "label": "Index algorithm",
2143
+ "info": "The method to use for generating an index file for the trained voice model. `KMeans` is particularly useful for large datasets.",
2144
+ "value": "Auto",
2145
+ "choices": [
2146
+ "Auto",
2147
+ "Faiss",
2148
+ "KMeans"
2149
+ ],
2150
+ "multiselect": null,
2151
+ "allow_custom_value": false,
2152
+ "type": "value",
2153
+ "visible": true,
2154
+ "scale": null,
2155
+ "render": true,
2156
+ "exclude_value": false
2157
+ },
2158
+ "pretrained_type": {
2159
+ "label": "Pretrained model type",
2160
+ "info": "The type of pretrained model to finetune the voice model on. `None` will train the voice model from scratch, while `Default` will use a pretrained model tailored to the specific voice model architecture. `Custom` will use a custom pretrained that you provide.",
2161
+ "value": "Default",
2162
+ "choices": [
2163
+ "None",
2164
+ "Default",
2165
+ "Custom"
2166
+ ],
2167
+ "multiselect": null,
2168
+ "allow_custom_value": false,
2169
+ "type": "value",
2170
+ "visible": true,
2171
+ "scale": null,
2172
+ "render": true,
2173
+ "exclude_value": true
2174
+ },
2175
+ "custom_pretrained_model": {
2176
+ "label": "Custom pretrained model",
2177
+ "info": "Select a custom pretrained model to finetune from the dropdown.",
2178
+ "value": null,
2179
+ "choices": null,
2180
+ "multiselect": null,
2181
+ "allow_custom_value": false,
2182
+ "type": "value",
2183
+ "visible": false,
2184
+ "scale": null,
2185
+ "render": false,
2186
+ "exclude_value": true
2187
+ },
2188
+ "save_interval": {
2189
+ "label": "Save interval",
2190
+ "info": "The epoch interval at which to to save voice model weights and checkpoints. The best model weights are always saved regardless of this setting.",
2191
+ "value": 10,
2192
+ "minimum": 1.0,
2193
+ "maximum": 100.0,
2194
+ "step": 1.0,
2195
+ "visible": true,
2196
+ "scale": null,
2197
+ "render": true,
2198
+ "exclude_value": false
2199
+ },
2200
+ "save_all_checkpoints": {
2201
+ "label": "Save all checkpoints",
2202
+ "info": "Whether to save a unique checkpoint at each save interval. If not enabled, only the latest checkpoint will be saved at each interval.",
2203
+ "value": false,
2204
+ "visible": true,
2205
+ "scale": null,
2206
+ "render": true,
2207
+ "exclude_value": false
2208
+ },
2209
+ "save_all_weights": {
2210
+ "label": "Save all weights",
2211
+ "info": "Whether to save unique voice model weights at each save interval. If not enabled, only the best voice model weights will be saved.",
2212
+ "value": false,
2213
+ "visible": true,
2214
+ "scale": null,
2215
+ "render": true,
2216
+ "exclude_value": false
2217
+ },
2218
+ "clear_saved_data": {
2219
+ "label": "Clear saved data",
2220
+ "info": "Whether to delete any existing training data associated with the voice model before training commences. Enable this setting only if you are training a new voice model from scratch or restarting training.",
2221
+ "value": false,
2222
+ "visible": true,
2223
+ "scale": null,
2224
+ "render": true,
2225
+ "exclude_value": false
2226
+ },
2227
+ "upload_model": {
2228
+ "label": "Upload voice model",
2229
+ "info": "Whether to automatically upload the trained voice model so that it can be used for generation tasks within the Ultimate RVC app.",
2230
+ "value": false,
2231
+ "visible": true,
2232
+ "scale": null,
2233
+ "render": true,
2234
+ "exclude_value": true
2235
+ },
2236
+ "upload_name": {
2237
+ "label": "Upload name",
2238
+ "info": "The name to give the uploaded voice model.",
2239
+ "value": null,
2240
+ "visible": false,
2241
+ "scale": null,
2242
+ "render": true,
2243
+ "exclude_value": true,
2244
+ "placeholder": null
2245
+ },
2246
+ "training_acceleration": {
2247
+ "label": "Hardware acceleration",
2248
+ "info": "The type of hardware acceleration to use. 'Automatic' will automatically select the first available GPU and fall back to CPU if no GPUs are available.",
2249
+ "value": "Automatic",
2250
+ "choices": [
2251
+ "Automatic",
2252
+ "CPU",
2253
+ "GPU"
2254
+ ],
2255
+ "multiselect": null,
2256
+ "allow_custom_value": false,
2257
+ "type": "value",
2258
+ "visible": true,
2259
+ "scale": null,
2260
+ "render": true,
2261
+ "exclude_value": true
2262
+ },
2263
+ "training_gpus": {
2264
+ "label": "GPU(s)",
2265
+ "info": "The GPU(s) to use for hardware acceleration.",
2266
+ "value": null,
2267
+ "choices": null,
2268
+ "multiselect": true,
2269
+ "allow_custom_value": false,
2270
+ "type": "value",
2271
+ "visible": false,
2272
+ "scale": null,
2273
+ "render": true,
2274
+ "exclude_value": true
2275
+ },
2276
+ "preload_dataset": {
2277
+ "label": "Preload dataset",
2278
+ "info": "Whether to preload all training data into GPU memory. This can improve training speed but requires a lot of VRAM.<br><br>",
2279
+ "value": false,
2280
+ "visible": true,
2281
+ "scale": null,
2282
+ "render": true,
2283
+ "exclude_value": false
2284
+ },
2285
+ "reduce_memory_usage": {
2286
+ "label": "Reduce memory usage",
2287
+ "info": "Whether to reduce VRAM usage at the cost of slower training speed by enabling activation checkpointing. This is useful for GPUs with limited memory (e.g., <6GB VRAM) or when training with a batch size larger than what your GPU can normally accommodate.",
2288
+ "value": false,
2289
+ "visible": true,
2290
+ "scale": null,
2291
+ "render": true,
2292
+ "exclude_value": false
2293
+ }
2294
+ }
2295
+ },
2296
+ "management": {
2297
+ "model": {
2298
+ "voices": {
2299
+ "label": "Voice models",
2300
+ "info": "Select one or more voice models to delete.",
2301
+ "value": null,
2302
+ "choices": null,
2303
+ "multiselect": true,
2304
+ "allow_custom_value": false,
2305
+ "type": "value",
2306
+ "visible": true,
2307
+ "scale": null,
2308
+ "render": false,
2309
+ "exclude_value": true
2310
+ },
2311
+ "embedders": {
2312
+ "label": "Custom embedder models",
2313
+ "info": "Select one or more embedder models to delete.",
2314
+ "value": null,
2315
+ "choices": null,
2316
+ "multiselect": true,
2317
+ "allow_custom_value": false,
2318
+ "type": "value",
2319
+ "visible": true,
2320
+ "scale": null,
2321
+ "render": false,
2322
+ "exclude_value": true
2323
+ },
2324
+ "pretraineds": {
2325
+ "label": "Custom pretrained models",
2326
+ "info": "Select one or more pretrained models to delete.",
2327
+ "value": null,
2328
+ "choices": null,
2329
+ "multiselect": true,
2330
+ "allow_custom_value": false,
2331
+ "type": "value",
2332
+ "visible": true,
2333
+ "scale": null,
2334
+ "render": false,
2335
+ "exclude_value": true
2336
+ },
2337
+ "traineds": {
2338
+ "label": "Training models",
2339
+ "info": "Select one or more training models to delete.",
2340
+ "value": null,
2341
+ "choices": null,
2342
+ "multiselect": true,
2343
+ "allow_custom_value": false,
2344
+ "type": "value",
2345
+ "visible": true,
2346
+ "scale": null,
2347
+ "render": false,
2348
+ "exclude_value": true
2349
+ },
2350
+ "dummy_checkbox": {
2351
+ "label": null,
2352
+ "info": null,
2353
+ "value": false,
2354
+ "visible": false,
2355
+ "scale": null,
2356
+ "render": true,
2357
+ "exclude_value": true
2358
+ }
2359
+ },
2360
+ "audio": {
2361
+ "intermediate": {
2362
+ "label": "Song directories",
2363
+ "info": "Select one or more song directories containing intermediate audio files to delete.",
2364
+ "value": null,
2365
+ "choices": null,
2366
+ "multiselect": true,
2367
+ "allow_custom_value": false,
2368
+ "type": "value",
2369
+ "visible": true,
2370
+ "scale": null,
2371
+ "render": false,
2372
+ "exclude_value": true
2373
+ },
2374
+ "speech": {
2375
+ "label": "Speech audio files",
2376
+ "info": "Select one or more speech audio files to delete.",
2377
+ "value": null,
2378
+ "choices": null,
2379
+ "multiselect": true,
2380
+ "allow_custom_value": false,
2381
+ "type": "value",
2382
+ "visible": true,
2383
+ "scale": null,
2384
+ "render": false,
2385
+ "exclude_value": true
2386
+ },
2387
+ "output": {
2388
+ "label": "Output audio files",
2389
+ "info": "Select one or more output audio files to delete.",
2390
+ "value": null,
2391
+ "choices": null,
2392
+ "multiselect": true,
2393
+ "allow_custom_value": false,
2394
+ "type": "value",
2395
+ "visible": true,
2396
+ "scale": null,
2397
+ "render": false,
2398
+ "exclude_value": true
2399
+ },
2400
+ "dataset": {
2401
+ "label": "Dataset audio files",
2402
+ "info": "Select one or more datasets containing audio files to delete.",
2403
+ "value": null,
2404
+ "choices": null,
2405
+ "multiselect": true,
2406
+ "allow_custom_value": false,
2407
+ "type": "value",
2408
+ "visible": true,
2409
+ "scale": null,
2410
+ "render": false,
2411
+ "exclude_value": true
2412
+ },
2413
+ "dummy_checkbox": {
2414
+ "label": null,
2415
+ "info": null,
2416
+ "value": false,
2417
+ "visible": false,
2418
+ "scale": null,
2419
+ "render": true,
2420
+ "exclude_value": true
2421
+ }
2422
+ },
2423
+ "settings": {
2424
+ "load_config_name": {
2425
+ "label": "Configuration name",
2426
+ "info": "The name of a configuration to load UI settings from",
2427
+ "value": null,
2428
+ "choices": null,
2429
+ "multiselect": null,
2430
+ "allow_custom_value": false,
2431
+ "type": "value",
2432
+ "visible": true,
2433
+ "scale": null,
2434
+ "render": false,
2435
+ "exclude_value": true
2436
+ },
2437
+ "delete_config_names": {
2438
+ "label": "Configuration names",
2439
+ "info": "Select the name of one or more configurations to delete",
2440
+ "value": null,
2441
+ "choices": null,
2442
+ "multiselect": true,
2443
+ "allow_custom_value": false,
2444
+ "type": "value",
2445
+ "visible": true,
2446
+ "scale": null,
2447
+ "render": false,
2448
+ "exclude_value": true
2449
+ },
2450
+ "dummy_checkbox": {
2451
+ "label": null,
2452
+ "info": null,
2453
+ "value": false,
2454
+ "visible": false,
2455
+ "scale": null,
2456
+ "render": true,
2457
+ "exclude_value": true
2458
+ }
2459
+ }
2460
+ }
2461
+ }