mlx-rmvpe / README.md

lexandstuff

Add model card

59b0ccc verified about 12 hours ago

preview code

raw

history blame contribute delete

2.37 kB

metadata

license: mit
tags:
  - mlx
  - audio
  - pitch-estimation
  - f0
  - voice-conversion
  - rvc
  - apple-silicon
library_name: mlx
pipeline_tag: audio-to-audio

MLX-RMVPE

MLX implementation of RMVPE (Robust Model for Vocal Pitch Estimation) for Apple Silicon.

Model Description

RMVPE extracts fundamental frequency (F0) from audio, essential for preserving pitch/melody in voice conversion. Unlike simpler methods (CREPE, pYIN), RMVPE is specifically designed for polyphonic music, making it ideal for singing voice conversion where background music may be present.

Architecture: Deep U-Net with BiGRU layers
Parameters: ~15.4M
Input: 16kHz audio
Output: F0 in Hz at 100fps (hop_length=160)
Pitch range: ~32 Hz to ~1975 Hz (360 bins)

Usage

pip install mlx-rmvpe

import librosa
from mlx_rmvpe import RMVPE

# Load model (auto-downloads weights)
model = RMVPE.from_pretrained()

# Load audio at 16kHz
audio, sr = librosa.load("singing.wav", sr=16000, mono=True)

# Extract F0
f0 = model.infer_from_audio(audio)

print(f"F0 shape: {f0.shape} at 100fps")
print(f"Pitch range: {f0[f0 > 0].min():.1f} - {f0[f0 > 0].max():.1f} Hz")

Manual Loading

from huggingface_hub import hf_hub_download
from mlx_rmvpe import RMVPE

weights_path = hf_hub_download(
    repo_id="lexandstuff/mlx-rmvpe",
    filename="rmvpe.safetensors"
)

model = RMVPE()
model.load_weights(weights_path)
model.eval()

Technical Details

This implementation is converted from the PyTorch weights and produces numerically similar outputs:

Metric	Value
Mean F0 difference	1.29 Hz
Correlation	>0.99

See the GitHub repository for implementation details and the full API reference.

Citation

@inproceedings{wei2023rmvpe,
  title={RMVPE: A Robust Model for Vocal Pitch Estimation in Polyphonic Music},
  author={Wei, Yongmao and others},
  booktitle={ISMIR},
  year={2023}
}

License

MIT

Acknowledgments

RMVPE - Original implementation
RVC - Voice conversion pipeline
MLX - Apple's machine learning framework