lexandstuff commited on
Commit
577d144
·
verified ·
1 Parent(s): 8e82024

Upload folder using huggingface_hub

Browse files
README.md ADDED
@@ -0,0 +1,113 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ library_name: mlx
4
+ tags:
5
+ - mlx
6
+ - voice-conversion
7
+ - rvc
8
+ - apple-silicon
9
+ - audio
10
+ - speech
11
+ ---
12
+
13
+ # RVC-MLX Pretrained Weights
14
+
15
+ MLX-compatible pretrained weights for [RVC (Retrieval-based Voice Conversion)](https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI), converted for use with [rvc-mlx](https://github.com/lucasnewman/rvc-mlx).
16
+
17
+ These weights enable high-quality voice conversion on Apple Silicon Macs using the MLX framework.
18
+
19
+ ## Available Models
20
+
21
+ | File | Sample Rate | Size | Description |
22
+ |------|-------------|------|-------------|
23
+ | `v2/f0G48k.safetensors` | 48 kHz | 110 MB | V2 with F0 (pitch) - highest quality |
24
+ | `v2/f0G40k.safetensors` | 40 kHz | 105 MB | V2 with F0 (pitch) |
25
+ | `v2/f0G32k.safetensors` | 32 kHz | 107 MB | V2 with F0 (pitch) |
26
+
27
+ All models use:
28
+ - **Architecture**: SynthesizerTrnMs768NSFsid
29
+ - **Input**: 768-dim ContentVec features
30
+ - **F0 Support**: Yes (pitch-aware synthesis)
31
+
32
+ ## Quick Start
33
+
34
+ ```python
35
+ from huggingface_hub import hf_hub_download
36
+
37
+ # Download the 48kHz model
38
+ weights_path = hf_hub_download(
39
+ repo_id="lexandstuff/rvc-mlx-weights",
40
+ filename="v2/f0G48k.safetensors"
41
+ )
42
+
43
+ # Download config
44
+ config_path = hf_hub_download(
45
+ repo_id="lexandstuff/rvc-mlx-weights",
46
+ filename="v2/config.json"
47
+ )
48
+ ```
49
+
50
+ ## Usage with rvc-mlx
51
+
52
+ ```python
53
+ import json
54
+ from safetensors.numpy import load_file
55
+ from rvc_mlx.models import SynthesizerTrnMs768NSFsid
56
+
57
+ # Load config
58
+ with open(config_path) as f:
59
+ configs = json.load(f)
60
+ config = configs["48000"] # or "40000", "32000"
61
+
62
+ # Create model
63
+ model = SynthesizerTrnMs768NSFsid(**config)
64
+
65
+ # Load weights
66
+ weights = load_file(weights_path)
67
+ # ... load weights into model
68
+ ```
69
+
70
+ ## Model Details
71
+
72
+ These are **inference-only** weights - training components (posterior encoder) have been removed to reduce file size.
73
+
74
+ ### Architecture
75
+
76
+ ```
77
+ SynthesizerTrnMs768NSFsid
78
+ ├── enc_p (TextEncoder) - Encodes ContentVec + pitch
79
+ ├── flow (ResidualCoupling) - Normalizing flow for voice conversion
80
+ ├── dec (GeneratorNSF) - HiFi-GAN vocoder with neural source filter
81
+ └── emb_g (Embedding) - Speaker embedding
82
+ ```
83
+
84
+ ### Upsampling Rates
85
+
86
+ | Sample Rate | Upsample Rates | Total Factor |
87
+ |-------------|----------------|--------------|
88
+ | 32 kHz | [10, 8, 2, 2] | 320x |
89
+ | 40 kHz | [10, 10, 2, 2] | 400x |
90
+ | 48 kHz | [12, 10, 2, 2] | 480x |
91
+
92
+ ## Original Source
93
+
94
+ These weights are converted from the official RVC pretrained models:
95
+ - **Source**: [lj1995/VoiceConversionWebUI](https://huggingface.co/lj1995/VoiceConversionWebUI)
96
+ - **Files**: `pretrained_v2/f0G{32k,40k,48k}.pth`
97
+
98
+ ## License
99
+
100
+ MIT License - same as the original RVC project.
101
+
102
+ ## Citation
103
+
104
+ If you use these weights, please cite the original RVC project:
105
+
106
+ ```bibtex
107
+ @software{rvc2023,
108
+ author = {RVC-Project},
109
+ title = {Retrieval-based-Voice-Conversion-WebUI},
110
+ year = {2023},
111
+ url = {https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI}
112
+ }
113
+ ```
v2/config.json ADDED
@@ -0,0 +1,71 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "32000": {
3
+ "model_type": "SynthesizerTrnMs768NSFsid",
4
+ "version": "v2",
5
+ "sample_rate": 32000,
6
+ "f0": true,
7
+ "spec_channels": 1025,
8
+ "segment_size": 32,
9
+ "inter_channels": 192,
10
+ "hidden_channels": 192,
11
+ "filter_channels": 768,
12
+ "n_heads": 2,
13
+ "n_layers": 6,
14
+ "kernel_size": 3,
15
+ "p_dropout": 0,
16
+ "resblock": "1",
17
+ "resblock_kernel_sizes": [3, 7, 11],
18
+ "resblock_dilation_sizes": [[1, 3, 5], [1, 3, 5], [1, 3, 5]],
19
+ "upsample_rates": [10, 8, 2, 2],
20
+ "upsample_initial_channel": 512,
21
+ "upsample_kernel_sizes": [20, 16, 4, 4],
22
+ "spk_embed_dim": 109,
23
+ "gin_channels": 256
24
+ },
25
+ "40000": {
26
+ "model_type": "SynthesizerTrnMs768NSFsid",
27
+ "version": "v2",
28
+ "sample_rate": 40000,
29
+ "f0": true,
30
+ "spec_channels": 1025,
31
+ "segment_size": 32,
32
+ "inter_channels": 192,
33
+ "hidden_channels": 192,
34
+ "filter_channels": 768,
35
+ "n_heads": 2,
36
+ "n_layers": 6,
37
+ "kernel_size": 3,
38
+ "p_dropout": 0,
39
+ "resblock": "1",
40
+ "resblock_kernel_sizes": [3, 7, 11],
41
+ "resblock_dilation_sizes": [[1, 3, 5], [1, 3, 5], [1, 3, 5]],
42
+ "upsample_rates": [10, 10, 2, 2],
43
+ "upsample_initial_channel": 512,
44
+ "upsample_kernel_sizes": [20, 20, 4, 4],
45
+ "spk_embed_dim": 109,
46
+ "gin_channels": 256
47
+ },
48
+ "48000": {
49
+ "model_type": "SynthesizerTrnMs768NSFsid",
50
+ "version": "v2",
51
+ "sample_rate": 48000,
52
+ "f0": true,
53
+ "spec_channels": 1025,
54
+ "segment_size": 32,
55
+ "inter_channels": 192,
56
+ "hidden_channels": 192,
57
+ "filter_channels": 768,
58
+ "n_heads": 2,
59
+ "n_layers": 6,
60
+ "kernel_size": 3,
61
+ "p_dropout": 0,
62
+ "resblock": "1",
63
+ "resblock_kernel_sizes": [3, 7, 11],
64
+ "resblock_dilation_sizes": [[1, 3, 5], [1, 3, 5], [1, 3, 5]],
65
+ "upsample_rates": [12, 10, 2, 2],
66
+ "upsample_initial_channel": 512,
67
+ "upsample_kernel_sizes": [24, 20, 4, 4],
68
+ "spk_embed_dim": 109,
69
+ "gin_channels": 256
70
+ }
71
+ }
v2/f0G32k.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4579a923bb57bad01ea9225a5fb6b641cd85c99c230e901e0d50705cfc3e6f05
3
+ size 112277704
v2/f0G40k.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ccc4dda2fbbe8ec4ad0b614aa004934bfe54087b09fdcdd5cc86c082594a41fb
3
+ size 110196928
v2/f0G48k.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c51f93b025a70a7cea32432de1518d1842fec4336772066cc7be2c981189ba24
3
+ size 114915560