rvc-mlx-weights / README.md
lexandstuff's picture
Upload folder using huggingface_hub
577d144 verified
---
license: mit
library_name: mlx
tags:
- mlx
- voice-conversion
- rvc
- apple-silicon
- audio
- speech
---
# RVC-MLX Pretrained Weights
MLX-compatible pretrained weights for [RVC (Retrieval-based Voice Conversion)](https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI), converted for use with [rvc-mlx](https://github.com/lucasnewman/rvc-mlx).
These weights enable high-quality voice conversion on Apple Silicon Macs using the MLX framework.
## Available Models
| File | Sample Rate | Size | Description |
|------|-------------|------|-------------|
| `v2/f0G48k.safetensors` | 48 kHz | 110 MB | V2 with F0 (pitch) - highest quality |
| `v2/f0G40k.safetensors` | 40 kHz | 105 MB | V2 with F0 (pitch) |
| `v2/f0G32k.safetensors` | 32 kHz | 107 MB | V2 with F0 (pitch) |
All models use:
- **Architecture**: SynthesizerTrnMs768NSFsid
- **Input**: 768-dim ContentVec features
- **F0 Support**: Yes (pitch-aware synthesis)
## Quick Start
```python
from huggingface_hub import hf_hub_download
# Download the 48kHz model
weights_path = hf_hub_download(
repo_id="lexandstuff/rvc-mlx-weights",
filename="v2/f0G48k.safetensors"
)
# Download config
config_path = hf_hub_download(
repo_id="lexandstuff/rvc-mlx-weights",
filename="v2/config.json"
)
```
## Usage with rvc-mlx
```python
import json
from safetensors.numpy import load_file
from rvc_mlx.models import SynthesizerTrnMs768NSFsid
# Load config
with open(config_path) as f:
configs = json.load(f)
config = configs["48000"] # or "40000", "32000"
# Create model
model = SynthesizerTrnMs768NSFsid(**config)
# Load weights
weights = load_file(weights_path)
# ... load weights into model
```
## Model Details
These are **inference-only** weights - training components (posterior encoder) have been removed to reduce file size.
### Architecture
```
SynthesizerTrnMs768NSFsid
β”œβ”€β”€ enc_p (TextEncoder) - Encodes ContentVec + pitch
β”œβ”€β”€ flow (ResidualCoupling) - Normalizing flow for voice conversion
β”œβ”€β”€ dec (GeneratorNSF) - HiFi-GAN vocoder with neural source filter
└── emb_g (Embedding) - Speaker embedding
```
### Upsampling Rates
| Sample Rate | Upsample Rates | Total Factor |
|-------------|----------------|--------------|
| 32 kHz | [10, 8, 2, 2] | 320x |
| 40 kHz | [10, 10, 2, 2] | 400x |
| 48 kHz | [12, 10, 2, 2] | 480x |
## Original Source
These weights are converted from the official RVC pretrained models:
- **Source**: [lj1995/VoiceConversionWebUI](https://huggingface.co/lj1995/VoiceConversionWebUI)
- **Files**: `pretrained_v2/f0G{32k,40k,48k}.pth`
## License
MIT License - same as the original RVC project.
## Citation
If you use these weights, please cite the original RVC project:
```bibtex
@software{rvc2023,
author = {RVC-Project},
title = {Retrieval-based-Voice-Conversion-WebUI},
year = {2023},
url = {https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI}
}
```