lexandstuff
/

mlx-rmvpe

pitch-estimation

voice-conversion

Model card Files Files and versions

lexandstuff commited on Jan 19

Commit

59b0ccc

·

verified ·

1 Parent(s): ba61207

Add model card

Files changed (1) hide show

README.md +98 -0

README.md ADDED Viewed

	@@ -0,0 +1,98 @@

+---
+license: mit
+tags:
+  - mlx
+  - audio
+  - pitch-estimation
+  - f0
+  - voice-conversion
+  - rvc
+  - apple-silicon
+library_name: mlx
+pipeline_tag: audio-to-audio
+---
+# MLX-RMVPE
+MLX implementation of [RMVPE](https://arxiv.org/abs/2306.15412) (Robust Model for Vocal Pitch Estimation) for Apple Silicon.
+## Model Description
+RMVPE extracts **fundamental frequency (F0)** from audio, essential for preserving pitch/melody in voice conversion. Unlike simpler methods (CREPE, pYIN), RMVPE is specifically designed for **polyphonic music**, making it ideal for singing voice conversion where background music may be present.
+- **Architecture**: Deep U-Net with BiGRU layers
+- **Parameters**: ~15.4M
+- **Input**: 16kHz audio
+- **Output**: F0 in Hz at 100fps (hop_length=160)
+- **Pitch range**: ~32 Hz to ~1975 Hz (360 bins)
+## Usage
+```bash
+pip install mlx-rmvpe
+```
+```python
+import librosa
+from mlx_rmvpe import RMVPE
+# Load model (auto-downloads weights)
+model = RMVPE.from_pretrained()
+# Load audio at 16kHz
+audio, sr = librosa.load("singing.wav", sr=16000, mono=True)
+# Extract F0
+f0 = model.infer_from_audio(audio)
+print(f"F0 shape: {f0.shape} at 100fps")
+print(f"Pitch range: {f0[f0 > 0].min():.1f} - {f0[f0 > 0].max():.1f} Hz")
+```
+## Manual Loading
+```python
+from huggingface_hub import hf_hub_download
+from mlx_rmvpe import RMVPE
+weights_path = hf_hub_download(
+    repo_id="lexandstuff/mlx-rmvpe",
+    filename="rmvpe.safetensors"
+)
+model = RMVPE()
+model.load_weights(weights_path)
+model.eval()
+```
+## Technical Details
+This implementation is converted from the PyTorch weights and produces numerically similar outputs:
+| Metric | Value |
+|--------|-------|
+| Mean F0 difference | 1.29 Hz |
+| Correlation | >0.99 |
+See the [GitHub repository](https://github.com/lexandstuff/mlx-rmvpe) for implementation details and the full API reference.
+## Citation
+```bibtex
+@inproceedings{wei2023rmvpe,
+  title={RMVPE: A Robust Model for Vocal Pitch Estimation in Polyphonic Music},
+  author={Wei, Yongmao and others},
+  booktitle={ISMIR},
+  year={2023}
+}
+```
+## License
+MIT
+## Acknowledgments
+- [RMVPE](https://github.com/Dream-High/RMVPE) - Original implementation
+- [RVC](https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI) - Voice conversion pipeline
+- [MLX](https://github.com/ml-explore/mlx) - Apple's machine learning framework