Anonlestia commited on
Commit
6728d21
·
1 Parent(s): c442a81

Upload 18 files

Browse files
Ov2Super/32kfix/f0Ov2Super32kD.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2401113af8524bc5c0fe2221a81997b32f85db782f2271bb21d268f2fbf15c56
3
+ size 857123266
Ov2Super/32kfix/f0Ov2Super32kG.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e4af1279fb8fd15af9eacbb41687fc695e74009e9dd0edc634b6296453324db4
3
+ size 443230526
Ov2Super/40k/f0Ov2Super40kD.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e319e21eb26137803c62847857202bc43f833c0696be72bec395a74b3c2178aa
3
+ size 857126469
Ov2Super/40k/f0Ov2Super40kG.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0562bdf6fa73197a503ceddf376a79f33727d64e19b1c6371ebfd6872bceccbf
3
+ size 438183069
Ov2Super/ov2.txt ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ I'd like to present you a new test version of the pretrain Ov2! I've made it myself improving overall quality of it.
2
+
3
+ It's made on 40k sample rate with 18 new voices, which gives it a lot of variety.
4
+
5
+ To use it, you'll need to copy and paste both .pth files into pretrained_v2 folder of your RVC. Then, in your rvc you'll need to choose 40k sample rate V2 (all sample rate version will be released once i make the final version, this is just a test version). After you've preprocessed your dataset and extracted features, you'll have to put in the names of the pretrains as shown on the second screenshot (don't mismatch the G and D). After that, you can begin training your model.
6
+
7
+ Here are rough guidelines for the model training on this pretrain:
8
+ Minimum length for a ideal model is 1 minute, but you can train good models even with 10 seconds datasets on this pretrain
9
+ The models trained on this model don't require huge epoch count: usually for 3-5 minutes 40-60 can be enough, for 1-3 minutes, 60-100 is enough and for 10-60 seconds 200-300 is enough
10
+ Use clean datasets to get the best sounding results :3
11
+
12
+ ---
13
+ 3-5 minutes 40-60
14
+ 1-3 minutes 60-100
15
+ 10-60 seconds 200-300
16
+
17
+ Минимальная продолжительность идеальной модели составляет 1 минуту, но вы можете обучать хорошие модели даже с 10-секундными наборами данных на этом предварительном тренинге
18
+ Модели, обученные по этой модели, не требуют большого количества эпох: обычно для 3-5 минут может быть достаточно 40-60, для 1-3 минут достаточно 60-100, а для 10-60 секунд достаточно 200-300
SnowieV3.1-X-RinE3-40K/D_Snowie-X-Rin_40k.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:61f831b0150fb99b61d92873535fd44ac6d9e76df2da4d7fefc4390e5c3b94e6
3
+ size 857125986
SnowieV3.1-X-RinE3-40K/G_Snowie-X-Rin_40k.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bf70da2f59cff4a414d3b10607cab15a3ab5d706e06e9b3bd1d653e6abe0f2d9
3
+ size 438176910
SnowieV3.1/32k/D_SnowieV3.1_32k.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fd0e6067e06aee2962dd1600d0a7cb9ed3ab4da858003595feaed10b544c811e
3
+ size 857123266
SnowieV3.1/32k/G_SnowieV3.1_32k.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:94f8172dc7d975fae748a40e0a4ad09148a40166fb01a9c9aecadacedbad246a
3
+ size 443230526
SnowieV3.1/40k/D_SnowieV3.1_40k.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:33eb0b69d2eb1980105b044d7381d2c317652f011f7fa685a599aee846a68592
3
+ size 857123266
SnowieV3.1/40k/G_SnowieV3.1_40k.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:95aeacf9ac4c39830fc19bec5f11d780734f73bd53ce681d7f21b891f7da69e7
3
+ size 438167870
SnowieV3.1/48k/D_SnowieV3.1_48k.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:342c48ce64c691aeeec10be25e9821c42e9d81e64af454f58c4493807ae8530b
3
+ size 857123266
SnowieV3.1/48k/G_SnowieV3.1_48k.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6883d5b2fe92f78e10ac7e87fed24a7d9ef53826835ebb908b2428003f0d1f92
3
+ size 452323646
TITAN/Medium/D-f040k-TITAN-Medium.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bc804da752ba9cb8e3aaa90ad102edc0a6e7f033c90db53bfcce135f100d5ad1
3
+ size 857119946
TITAN/Medium/D-f048k-TITAN-Medium.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4999a5b66aec0a9ab7e845063eeb242e47a611ab0e6260fb0b5ca44a7c5bbc44
3
+ size 857126469
TITAN/Medium/G-f040k-TITAN-Medium.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:217de2fd349f48856aa6a6d9d1257b07a0addccde7b69fc7b22097f1fee5924d
3
+ size 438156650
TITAN/Medium/G-f048k-TITAN-Medium.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:51a6dc93687f7e1a6051be3dec958289958c9c738d5c520e4fa956e7886ee153
3
+ size 452338845
TITAN/README.md ADDED
@@ -0,0 +1,187 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ tags:
6
+ - ai
7
+ - rvc
8
+ - vc
9
+ - voice-cloning
10
+ - applio
11
+ - titan
12
+ - pretrained
13
+ datasets:
14
+ - blaise-tk/TITAN-Medium
15
+ pipeline_tag: audio-to-audio
16
+ ---
17
+
18
+ # TITAN: A Versatile, Robust, and High-Quality Pretrained Model for Retrieval-based Voice Conversion (RVC) Training
19
+
20
+ ## Overview
21
+
22
+ TITAN is a state-of-the-art pretrained model designed for Retrieval-based Voice Conversion (https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI/) training. It offers a robust solution for transforming voice characteristics from one speaker to another, providing high-quality results with minimal training effort.
23
+
24
+ ## Model Details
25
+
26
+ ### Titan-Medium
27
+
28
+ - Training Environment: Utilized a RTX 3060 TI on Applio v3.1.1 (https://github.com/IAHispano/Applio), employing a batch size of 8 over a span of 3 weeks.
29
+ - Iterations (48k): 1018660 Steps and 530 Epochs
30
+ - Iterations (40k): 1010588 Steps and 467 Epochs
31
+ - Iterations (32k): 1001469 Steps and 463 Epochs
32
+ - Sampling rate: 48k, 40k, 32k
33
+ - Fine-tuning Process: RVC v2 pretrained with pitch guidance, leveraging an 11.15-hour dataset sourced from Expresso (https://arxiv.org/abs/2308.05725) also available on [datasets/blaise-tk/TITAN-Medium](https://huggingface.co/datasets/blaise-tk/TITAN-Medium).
34
+
35
+ #### Samples
36
+ *Tests performed with a premature ckpt at ~700k steps doing all tests under the same conditions.*
37
+
38
+ <table style="width:100%; text-align:center;">
39
+ <tr>
40
+ <th>Titan-Medium</th>
41
+ <th>Ov2</th>
42
+ <th>Ov2.1</th>
43
+ </tr>
44
+ <tr>
45
+ <td>
46
+ <audio controls>
47
+ <source src="https://huggingface.co/blaise-tk/TITAN/resolve/main/demos/Model 1 - Test 1 - Titan.wav?download=true" type="audio/wav">
48
+ Your browser does not support the audio element.
49
+ </audio>
50
+ </td>
51
+ <td>
52
+ <audio controls>
53
+ <source src="https://huggingface.co/blaise-tk/TITAN/resolve/main/demos/Model 1 - Test 1 - Ov2.wav?download=true" type="audio/wav">
54
+ Your browser does not support the audio element.
55
+ </audio>
56
+ </td>
57
+ </tr>
58
+
59
+ </tr>
60
+ <tr>
61
+ <td>
62
+ <audio controls>
63
+ <source src="https://huggingface.co/blaise-tk/TITAN/resolve/main/demos/Model 1 - Test 2 - Titan.wav?download=true" type="audio/wav">
64
+ Your browser does not support the audio element.
65
+ </audio>
66
+ </td>
67
+ <td>
68
+ <audio controls>
69
+ <source src="https://huggingface.co/blaise-tk/TITAN/resolve/main/demos/Model 1 - Test 2 - Ov2.wav?download=true" type="audio/wav">
70
+ Your browser does not support the audio element.
71
+ </audio>
72
+ </td>
73
+ </tr>
74
+
75
+ <tr>
76
+ <td>
77
+ <audio controls>
78
+ <source src="https://huggingface.co/blaise-tk/TITAN/resolve/main/demos/Model 2 - Test 1 - Titan.wav?download=true" type="audio/wav">
79
+ Your browser does not support the audio element.
80
+ </audio>
81
+ </td>
82
+ <td>
83
+ <audio controls>
84
+ <source src="https://huggingface.co/blaise-tk/TITAN/resolve/main/demos/Model 2 - Test 1 - Ov2.wav?download=true" type="audio/wav">
85
+ Your browser does not support the audio element.
86
+ </audio>
87
+ </td>
88
+
89
+ </tr>
90
+ <tr>
91
+ <td>
92
+ <audio controls>
93
+ <source src="https://huggingface.co/blaise-tk/TITAN/resolve/main/demos/Model 2 - Test 2 - Titan.wav?download=true" type="audio/wav">
94
+ Your browser does not support the audio element.
95
+ </audio>
96
+ </td>
97
+ <td>
98
+ <audio controls>
99
+ <source src="https://huggingface.co/blaise-tk/TITAN/resolve/main/demos/Model 2 - Test 2 - Ov2.wav?download=true" type="audio/wav">
100
+ Your browser does not support the audio element.
101
+ </audio>
102
+ </td>
103
+ </tr>
104
+
105
+ </tr>
106
+ <tr>
107
+ <td>
108
+ <audio controls>
109
+ <source src="https://huggingface.co/blaise-tk/TITAN/resolve/main/demos/Model 3 - Test 1 - Titan.wav?download=true" type="audio/wav">
110
+ Your browser does not support the audio element.
111
+ </audio>
112
+ </td>
113
+ <td>
114
+ <audio controls>
115
+ <source src="https://huggingface.co/blaise-tk/TITAN/resolve/main/demos/Model 3 - Test 1 - Ov2.wav?download=true" type="audio/wav">
116
+ Your browser does not support the audio element.
117
+ </audio>
118
+ </td>
119
+ <td>
120
+ <audio controls>
121
+ <source src="https://huggingface.co/blaise-tk/TITAN/resolve/main/demos/Model 3 - Test 1 - Ov2.1.wav?download=true" type="audio/wav">
122
+ Your browser does not support the audio element.
123
+ </audio>
124
+ </td>
125
+ </tr>
126
+
127
+ </tr>
128
+ <tr>
129
+ <td>
130
+ <audio controls>
131
+ <source src="https://huggingface.co/blaise-tk/TITAN/resolve/main/demos/Model 3 - Test 2 - Titan.wav?download=true" type="audio/wav">
132
+ Your browser does not support the audio element.
133
+ </audio>
134
+ </td>
135
+ <td>
136
+ <audio controls>
137
+ <source src="https://huggingface.co/blaise-tk/TITAN/resolve/main/demos/Model 3 - Test 2 - Ov2.wav?download=true" type="audio/wav">
138
+ Your browser does not support the audio element.
139
+ </audio>
140
+ </td>
141
+ <td>
142
+ <audio controls>
143
+ <source src="https://huggingface.co/blaise-tk/TITAN/resolve/main/demos/Model 3 - Test 2 - Ov2.1.wav?download=true" type="audio/wav">
144
+ Your browser does not support the audio element.
145
+ </audio>
146
+ </td>
147
+ </tr>
148
+
149
+ </table>
150
+
151
+ ### Titan-Large
152
+
153
+ - Details forthcoming...
154
+
155
+ ## Collaborators
156
+
157
+ We appreciate the contributions of our collaborators who have helped in the development and refinement of TITAN.
158
+
159
+ - Mustar
160
+ - SimplCup
161
+ - UnitedShoes
162
+
163
+ ## Beta Testers
164
+
165
+ We extend our gratitude to the beta testers who provided valuable feedback during the testing phase of TITAN.
166
+
167
+ - SimplCup
168
+ - Leo_Frixi
169
+ - Light
170
+ - SCRFilms
171
+ - Ryanz
172
+ - Litsa_the_dancer
173
+
174
+ ## Citation
175
+
176
+ Should you find TITAN beneficial for your research endeavors or projects, we kindly request citing our repository:
177
+
178
+ ```
179
+ @article{titan,
180
+ title={TITAN: A Versatile, Robust, and High-Quality Pretrained Model for Retrieval-based Voice Conversion (RVC) Training},
181
+ author={Blaise},
182
+ journal={Hugging Face},
183
+ year={2024},
184
+ publisher={Blaise},
185
+ url={https://huggingface.co/blaise-tk/TITAN/}
186
+ }
187
+ ```