RMVPE: A Robust Model for Vocal Pitch Estimation in Polyphonic Music
Paper
•
2306.15412
•
Published
•
1
MLX implementation of RMVPE (Robust Model for Vocal Pitch Estimation) for Apple Silicon.
RMVPE extracts fundamental frequency (F0) from audio, essential for preserving pitch/melody in voice conversion. Unlike simpler methods (CREPE, pYIN), RMVPE is specifically designed for polyphonic music, making it ideal for singing voice conversion where background music may be present.
pip install mlx-rmvpe
import librosa
from mlx_rmvpe import RMVPE
# Load model (auto-downloads weights)
model = RMVPE.from_pretrained()
# Load audio at 16kHz
audio, sr = librosa.load("singing.wav", sr=16000, mono=True)
# Extract F0
f0 = model.infer_from_audio(audio)
print(f"F0 shape: {f0.shape} at 100fps")
print(f"Pitch range: {f0[f0 > 0].min():.1f} - {f0[f0 > 0].max():.1f} Hz")
from huggingface_hub import hf_hub_download
from mlx_rmvpe import RMVPE
weights_path = hf_hub_download(
repo_id="lexandstuff/mlx-rmvpe",
filename="rmvpe.safetensors"
)
model = RMVPE()
model.load_weights(weights_path)
model.eval()
This implementation is converted from the PyTorch weights and produces numerically similar outputs:
| Metric | Value |
|---|---|
| Mean F0 difference | 1.29 Hz |
| Correlation | >0.99 |
See the GitHub repository for implementation details and the full API reference.
@inproceedings{wei2023rmvpe,
title={RMVPE: A Robust Model for Vocal Pitch Estimation in Polyphonic Music},
author={Wei, Yongmao and others},
booktitle={ISMIR},
year={2023}
}
MIT