Add README

1655e75 verified about 1 month ago

3.03 kB

	---
	license: mit
	tags:
	- audio
	- voice-conversion
	- rvc
	- safetensors
	- maestraea
	pipeline_tag: audio-to-audio
	---

	# RVC Inference Models (Safetensors)

	Retrieval-Based Voice Conversion — V2 Pretrained Models

	[Original Source](https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI) · MIT License

	> V2 pretrained models converted from `.pth` to safetensors format (except HuBERT which requires fairseq for deserialization). For use with [Mæstræa AI Workstation](https://github.com/AEmotionStudio/Maestraea).

	## What's in This Repo

	### Core Models

	\| File \| Size \| Description \|
	\|------\|------\|-------------\|
	\| `hubert_base.pt` \| 190 MB \| HuBERT feature extractor (kept as .pt — requires fairseq) \|
	\| `rmvpe.safetensors` \| 181 MB \| RMVPE pitch detection model \|

	### Pretrained V2 — Generator Models (Inference)

	\| File \| Size \| Sample Rate \|
	\|------\|------\|-------------\|
	\| `pretrained_v2/G32k.safetensors` \| 74 MB \| 32kHz \|
	\| `pretrained_v2/G40k.safetensors` \| 73 MB \| 40kHz \|
	\| `pretrained_v2/G48k.safetensors` \| 75 MB \| 48kHz \|
	\| `pretrained_v2/f0G32k.safetensors` \| 74 MB \| 32kHz (with F0) \|
	\| `pretrained_v2/f0G40k.safetensors` \| 73 MB \| 40kHz (with F0) \|
	\| `pretrained_v2/f0G48k.safetensors` \| 75 MB \| 48kHz (with F0) \|

	### Pretrained V2 — Discriminator Models (Training)

	\| File \| Size \| Sample Rate \|
	\|------\|------\|-------------\|
	\| `pretrained_v2/D32k.safetensors` \| 143 MB \| 32kHz \|
	\| `pretrained_v2/D40k.safetensors` \| 143 MB \| 40kHz \|
	\| `pretrained_v2/D48k.safetensors` \| 143 MB \| 48kHz \|
	\| `pretrained_v2/f0D32k.safetensors` \| 143 MB \| 32kHz (with F0) \|
	\| `pretrained_v2/f0D40k.safetensors` \| 143 MB \| 40kHz (with F0) \|
	\| `pretrained_v2/f0D48k.safetensors` \| 143 MB \| 48kHz (with F0) \|

	Total: ~1.7 GB (inference-only subset of the full 80 GB RVC repo)

	## What RVC Does

	RVC (Retrieval-based Voice Conversion) converts vocals from one voice to another:

	- Batch mode — Upload audio → convert → download result
	- Real-time mode — Low-latency WebSocket streaming (future)
	- Voice models are small (~50–100 MB each) and user-provided

	### Key Parameters

	\| Parameter \| Range \| Default \| Description \|
	\|-----------\|-------\|---------\|-------------\|
	\| `pitch_shift` \| -12 to 12 \| 0 \| Semitone shift \|
	\| `f0_method` \| rmvpe/crepe/harvest \| rmvpe \| Pitch detection \|
	\| `index_rate` \| 0–1 \| 0.75 \| Retrieval index strength \|
	\| `protect` \| 0–0.5 \| 0.33 \| Protect voiceless consonants \|

	### VRAM Requirements

	- Minimum: ~2 GB
	- Recommended: ~6 GB

	## Usage with Mæstræa

	Place in `~/.maestraea/models/rvc/`. Voice model files (`.pth` + `.index`) go in `~/.maestraea/models/rvc/voices/`.

	## License

	MIT — same as the original RVC release.

	## Credits

	- Model: [RVC-Project](https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI)
	- Original weights: [lj1995/VoiceConversionWebUI](https://huggingface.co/lj1995/VoiceConversionWebUI)
	- Conversion & Mirror by: [AEmotionStudio](https://huggingface.co/AEmotionStudio)

	---
	license: mit
	tags:
	- audio
	- voice-conversion
	- rvc
	- safetensors
	- maestraea
	pipeline_tag: audio-to-audio
	---

	# RVC Inference Models (Safetensors)

	Retrieval-Based Voice Conversion — V2 Pretrained Models

	[Original Source](https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI) · MIT License

	> V2 pretrained models converted from `.pth` to safetensors format (except HuBERT which requires fairseq for deserialization). For use with [Mæstræa AI Workstation](https://github.com/AEmotionStudio/Maestraea).

	## What's in This Repo

	### Core Models

	\| File \| Size \| Description \|
	\|------\|------\|-------------\|
	\| `hubert_base.pt` \| 190 MB \| HuBERT feature extractor (kept as .pt — requires fairseq) \|
	\| `rmvpe.safetensors` \| 181 MB \| RMVPE pitch detection model \|

	### Pretrained V2 — Generator Models (Inference)

	\| File \| Size \| Sample Rate \|
	\|------\|------\|-------------\|
	\| `pretrained_v2/G32k.safetensors` \| 74 MB \| 32kHz \|
	\| `pretrained_v2/G40k.safetensors` \| 73 MB \| 40kHz \|
	\| `pretrained_v2/G48k.safetensors` \| 75 MB \| 48kHz \|
	\| `pretrained_v2/f0G32k.safetensors` \| 74 MB \| 32kHz (with F0) \|
	\| `pretrained_v2/f0G40k.safetensors` \| 73 MB \| 40kHz (with F0) \|
	\| `pretrained_v2/f0G48k.safetensors` \| 75 MB \| 48kHz (with F0) \|

	### Pretrained V2 — Discriminator Models (Training)

	\| File \| Size \| Sample Rate \|
	\|------\|------\|-------------\|
	\| `pretrained_v2/D32k.safetensors` \| 143 MB \| 32kHz \|
	\| `pretrained_v2/D40k.safetensors` \| 143 MB \| 40kHz \|
	\| `pretrained_v2/D48k.safetensors` \| 143 MB \| 48kHz \|
	\| `pretrained_v2/f0D32k.safetensors` \| 143 MB \| 32kHz (with F0) \|
	\| `pretrained_v2/f0D40k.safetensors` \| 143 MB \| 40kHz (with F0) \|
	\| `pretrained_v2/f0D48k.safetensors` \| 143 MB \| 48kHz (with F0) \|

	Total: ~1.7 GB (inference-only subset of the full 80 GB RVC repo)

	## What RVC Does

	RVC (Retrieval-based Voice Conversion) converts vocals from one voice to another:

	- Batch mode — Upload audio → convert → download result
	- Real-time mode — Low-latency WebSocket streaming (future)
	- Voice models are small (~50–100 MB each) and user-provided

	### Key Parameters

	\| Parameter \| Range \| Default \| Description \|
	\|-----------\|-------\|---------\|-------------\|
	\| `pitch_shift` \| -12 to 12 \| 0 \| Semitone shift \|
	\| `f0_method` \| rmvpe/crepe/harvest \| rmvpe \| Pitch detection \|
	\| `index_rate` \| 0–1 \| 0.75 \| Retrieval index strength \|
	\| `protect` \| 0–0.5 \| 0.33 \| Protect voiceless consonants \|

	### VRAM Requirements

	- Minimum: ~2 GB
	- Recommended: ~6 GB

	## Usage with Mæstræa

	Place in `~/.maestraea/models/rvc/`. Voice model files (`.pth` + `.index`) go in `~/.maestraea/models/rvc/voices/`.

	## License

	MIT — same as the original RVC release.

	## Credits

	- Model: [RVC-Project](https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI)
	- Original weights: [lj1995/VoiceConversionWebUI](https://huggingface.co/lj1995/VoiceConversionWebUI)
	- Conversion & Mirror by: [AEmotionStudio](https://huggingface.co/AEmotionStudio)