lexandstuff
/

rvc-mlx-weights

voice-conversion

Model card Files Files and versions

rvc-mlx-weights / README.md

lexandstuff's picture

Upload folder using huggingface_hub

577d144 verified about 2 months ago

|

history blame contribute delete

2.98 kB

	---
	license: mit
	library_name: mlx
	tags:
	- mlx
	- voice-conversion
	- rvc
	- apple-silicon
	- audio
	- speech
	---

	# RVC-MLX Pretrained Weights

	MLX-compatible pretrained weights for [RVC (Retrieval-based Voice Conversion)](https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI), converted for use with [rvc-mlx](https://github.com/lucasnewman/rvc-mlx).

	These weights enable high-quality voice conversion on Apple Silicon Macs using the MLX framework.

	## Available Models

	\| File \| Sample Rate \| Size \| Description \|
	\|------\|-------------\|------\|-------------\|
	\| `v2/f0G48k.safetensors` \| 48 kHz \| 110 MB \| V2 with F0 (pitch) - highest quality \|
	\| `v2/f0G40k.safetensors` \| 40 kHz \| 105 MB \| V2 with F0 (pitch) \|
	\| `v2/f0G32k.safetensors` \| 32 kHz \| 107 MB \| V2 with F0 (pitch) \|

	All models use:
	- Architecture: SynthesizerTrnMs768NSFsid
	- Input: 768-dim ContentVec features
	- F0 Support: Yes (pitch-aware synthesis)

	## Quick Start

	```python
	from huggingface_hub import hf_hub_download

	# Download the 48kHz model
	weights_path = hf_hub_download(
	repo_id="lexandstuff/rvc-mlx-weights",
	filename="v2/f0G48k.safetensors"
	)

	# Download config
	config_path = hf_hub_download(
	repo_id="lexandstuff/rvc-mlx-weights",
	filename="v2/config.json"
	)
	```

	## Usage with rvc-mlx

	```python
	import json
	from safetensors.numpy import load_file
	from rvc_mlx.models import SynthesizerTrnMs768NSFsid

	# Load config
	with open(config_path) as f:
	configs = json.load(f)
	config = configs["48000"] # or "40000", "32000"

	# Create model
	model = SynthesizerTrnMs768NSFsid(**config)

	# Load weights
	weights = load_file(weights_path)
	# ... load weights into model
	```

	## Model Details

	These are inference-only weights - training components (posterior encoder) have been removed to reduce file size.

	### Architecture

	```
	SynthesizerTrnMs768NSFsid
	├── enc_p (TextEncoder) - Encodes ContentVec + pitch
	├── flow (ResidualCoupling) - Normalizing flow for voice conversion
	├── dec (GeneratorNSF) - HiFi-GAN vocoder with neural source filter
	└── emb_g (Embedding) - Speaker embedding
	```

	### Upsampling Rates

	\| Sample Rate \| Upsample Rates \| Total Factor \|
	\|-------------\|----------------\|--------------\|
	\| 32 kHz \| [10, 8, 2, 2] \| 320x \|
	\| 40 kHz \| [10, 10, 2, 2] \| 400x \|
	\| 48 kHz \| [12, 10, 2, 2] \| 480x \|

	## Original Source

	These weights are converted from the official RVC pretrained models:
	- Source: [lj1995/VoiceConversionWebUI](https://huggingface.co/lj1995/VoiceConversionWebUI)
	- Files: `pretrained_v2/f0G{32k,40k,48k}.pth`

	## License

	MIT License - same as the original RVC project.

	## Citation

	If you use these weights, please cite the original RVC project:

	```bibtex
	@software{rvc2023,
	author = {RVC-Project},
	title = {Retrieval-based-Voice-Conversion-WebUI},
	year = {2023},
	url = {https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI}
	}
	```