--- license: mit library_name: mlx tags: - mlx - voice-conversion - rvc - apple-silicon - audio - speech --- # RVC-MLX Pretrained Weights MLX-compatible pretrained weights for [RVC (Retrieval-based Voice Conversion)](https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI), converted for use with [rvc-mlx](https://github.com/lucasnewman/rvc-mlx). These weights enable high-quality voice conversion on Apple Silicon Macs using the MLX framework. ## Available Models | File | Sample Rate | Size | Description | |------|-------------|------|-------------| | `v2/f0G48k.safetensors` | 48 kHz | 110 MB | V2 with F0 (pitch) - highest quality | | `v2/f0G40k.safetensors` | 40 kHz | 105 MB | V2 with F0 (pitch) | | `v2/f0G32k.safetensors` | 32 kHz | 107 MB | V2 with F0 (pitch) | All models use: - **Architecture**: SynthesizerTrnMs768NSFsid - **Input**: 768-dim ContentVec features - **F0 Support**: Yes (pitch-aware synthesis) ## Quick Start ```python from huggingface_hub import hf_hub_download # Download the 48kHz model weights_path = hf_hub_download( repo_id="lexandstuff/rvc-mlx-weights", filename="v2/f0G48k.safetensors" ) # Download config config_path = hf_hub_download( repo_id="lexandstuff/rvc-mlx-weights", filename="v2/config.json" ) ``` ## Usage with rvc-mlx ```python import json from safetensors.numpy import load_file from rvc_mlx.models import SynthesizerTrnMs768NSFsid # Load config with open(config_path) as f: configs = json.load(f) config = configs["48000"] # or "40000", "32000" # Create model model = SynthesizerTrnMs768NSFsid(**config) # Load weights weights = load_file(weights_path) # ... load weights into model ``` ## Model Details These are **inference-only** weights - training components (posterior encoder) have been removed to reduce file size. ### Architecture ``` SynthesizerTrnMs768NSFsid ├── enc_p (TextEncoder) - Encodes ContentVec + pitch ├── flow (ResidualCoupling) - Normalizing flow for voice conversion ├── dec (GeneratorNSF) - HiFi-GAN vocoder with neural source filter └── emb_g (Embedding) - Speaker embedding ``` ### Upsampling Rates | Sample Rate | Upsample Rates | Total Factor | |-------------|----------------|--------------| | 32 kHz | [10, 8, 2, 2] | 320x | | 40 kHz | [10, 10, 2, 2] | 400x | | 48 kHz | [12, 10, 2, 2] | 480x | ## Original Source These weights are converted from the official RVC pretrained models: - **Source**: [lj1995/VoiceConversionWebUI](https://huggingface.co/lj1995/VoiceConversionWebUI) - **Files**: `pretrained_v2/f0G{32k,40k,48k}.pth` ## License MIT License - same as the original RVC project. ## Citation If you use these weights, please cite the original RVC project: ```bibtex @software{rvc2023, author = {RVC-Project}, title = {Retrieval-based-Voice-Conversion-WebUI}, year = {2023}, url = {https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI} } ```