| | --- |
| | license: mit |
| | library_name: mlx |
| | tags: |
| | - mlx |
| | - voice-conversion |
| | - rvc |
| | - apple-silicon |
| | - audio |
| | - speech |
| | --- |
| | |
| | # RVC-MLX Pretrained Weights |
| |
|
| | MLX-compatible pretrained weights for [RVC (Retrieval-based Voice Conversion)](https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI), converted for use with [rvc-mlx](https://github.com/lucasnewman/rvc-mlx). |
| |
|
| | These weights enable high-quality voice conversion on Apple Silicon Macs using the MLX framework. |
| |
|
| | ## Available Models |
| |
|
| | | File | Sample Rate | Size | Description | |
| | |------|-------------|------|-------------| |
| | | `v2/f0G48k.safetensors` | 48 kHz | 110 MB | V2 with F0 (pitch) - highest quality | |
| | | `v2/f0G40k.safetensors` | 40 kHz | 105 MB | V2 with F0 (pitch) | |
| | | `v2/f0G32k.safetensors` | 32 kHz | 107 MB | V2 with F0 (pitch) | |
| |
|
| | All models use: |
| | - **Architecture**: SynthesizerTrnMs768NSFsid |
| | - **Input**: 768-dim ContentVec features |
| | - **F0 Support**: Yes (pitch-aware synthesis) |
| |
|
| | ## Quick Start |
| |
|
| | ```python |
| | from huggingface_hub import hf_hub_download |
| | |
| | # Download the 48kHz model |
| | weights_path = hf_hub_download( |
| | repo_id="lexandstuff/rvc-mlx-weights", |
| | filename="v2/f0G48k.safetensors" |
| | ) |
| | |
| | # Download config |
| | config_path = hf_hub_download( |
| | repo_id="lexandstuff/rvc-mlx-weights", |
| | filename="v2/config.json" |
| | ) |
| | ``` |
| |
|
| | ## Usage with rvc-mlx |
| |
|
| | ```python |
| | import json |
| | from safetensors.numpy import load_file |
| | from rvc_mlx.models import SynthesizerTrnMs768NSFsid |
| | |
| | # Load config |
| | with open(config_path) as f: |
| | configs = json.load(f) |
| | config = configs["48000"] # or "40000", "32000" |
| | |
| | # Create model |
| | model = SynthesizerTrnMs768NSFsid(**config) |
| | |
| | # Load weights |
| | weights = load_file(weights_path) |
| | # ... load weights into model |
| | ``` |
| |
|
| | ## Model Details |
| |
|
| | These are **inference-only** weights - training components (posterior encoder) have been removed to reduce file size. |
| |
|
| | ### Architecture |
| |
|
| | ``` |
| | SynthesizerTrnMs768NSFsid |
| | βββ enc_p (TextEncoder) - Encodes ContentVec + pitch |
| | βββ flow (ResidualCoupling) - Normalizing flow for voice conversion |
| | βββ dec (GeneratorNSF) - HiFi-GAN vocoder with neural source filter |
| | βββ emb_g (Embedding) - Speaker embedding |
| | ``` |
| |
|
| | ### Upsampling Rates |
| |
|
| | | Sample Rate | Upsample Rates | Total Factor | |
| | |-------------|----------------|--------------| |
| | | 32 kHz | [10, 8, 2, 2] | 320x | |
| | | 40 kHz | [10, 10, 2, 2] | 400x | |
| | | 48 kHz | [12, 10, 2, 2] | 480x | |
| |
|
| | ## Original Source |
| |
|
| | These weights are converted from the official RVC pretrained models: |
| | - **Source**: [lj1995/VoiceConversionWebUI](https://huggingface.co/lj1995/VoiceConversionWebUI) |
| | - **Files**: `pretrained_v2/f0G{32k,40k,48k}.pth` |
| |
|
| | ## License |
| |
|
| | MIT License - same as the original RVC project. |
| |
|
| | ## Citation |
| |
|
| | If you use these weights, please cite the original RVC project: |
| |
|
| | ```bibtex |
| | @software{rvc2023, |
| | author = {RVC-Project}, |
| | title = {Retrieval-based-Voice-Conversion-WebUI}, |
| | year = {2023}, |
| | url = {https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI} |
| | } |
| | ``` |
| |
|