File size: 2,977 Bytes

577d144

---
license: mit
library_name: mlx
tags:
  - mlx
  - voice-conversion
  - rvc
  - apple-silicon
  - audio
  - speech
---

# RVC-MLX Pretrained Weights

MLX-compatible pretrained weights for [RVC (Retrieval-based Voice Conversion)](https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI), converted for use with [rvc-mlx](https://github.com/lucasnewman/rvc-mlx).

These weights enable high-quality voice conversion on Apple Silicon Macs using the MLX framework.

## Available Models

| File | Sample Rate | Size | Description |
|------|-------------|------|-------------|
| `v2/f0G48k.safetensors` | 48 kHz | 110 MB | V2 with F0 (pitch) - highest quality |
| `v2/f0G40k.safetensors` | 40 kHz | 105 MB | V2 with F0 (pitch) |
| `v2/f0G32k.safetensors` | 32 kHz | 107 MB | V2 with F0 (pitch) |

All models use:
- **Architecture**: SynthesizerTrnMs768NSFsid
- **Input**: 768-dim ContentVec features
- **F0 Support**: Yes (pitch-aware synthesis)

## Quick Start

```python
from huggingface_hub import hf_hub_download

# Download the 48kHz model
weights_path = hf_hub_download(
    repo_id="lexandstuff/rvc-mlx-weights",
    filename="v2/f0G48k.safetensors"
)

# Download config
config_path = hf_hub_download(
    repo_id="lexandstuff/rvc-mlx-weights",
    filename="v2/config.json"
)
```

## Usage with rvc-mlx

```python
import json
from safetensors.numpy import load_file
from rvc_mlx.models import SynthesizerTrnMs768NSFsid

# Load config
with open(config_path) as f:
    configs = json.load(f)
    config = configs["48000"]  # or "40000", "32000"

# Create model
model = SynthesizerTrnMs768NSFsid(**config)

# Load weights
weights = load_file(weights_path)
# ... load weights into model
```

## Model Details

These are **inference-only** weights - training components (posterior encoder) have been removed to reduce file size.

### Architecture

```
SynthesizerTrnMs768NSFsid
├── enc_p (TextEncoder)      - Encodes ContentVec + pitch
├── flow (ResidualCoupling)  - Normalizing flow for voice conversion
├── dec (GeneratorNSF)       - HiFi-GAN vocoder with neural source filter
└── emb_g (Embedding)        - Speaker embedding
```

### Upsampling Rates

| Sample Rate | Upsample Rates | Total Factor |
|-------------|----------------|--------------|
| 32 kHz | [10, 8, 2, 2] | 320x |
| 40 kHz | [10, 10, 2, 2] | 400x |
| 48 kHz | [12, 10, 2, 2] | 480x |

## Original Source

These weights are converted from the official RVC pretrained models:
- **Source**: [lj1995/VoiceConversionWebUI](https://huggingface.co/lj1995/VoiceConversionWebUI)
- **Files**: `pretrained_v2/f0G{32k,40k,48k}.pth`

## License

MIT License - same as the original RVC project.

## Citation

If you use these weights, please cite the original RVC project:

```bibtex
@software{rvc2023,
  author = {RVC-Project},
  title = {Retrieval-based-Voice-Conversion-WebUI},
  year = {2023},
  url = {https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI}
}
```