lexandstuff commited on
Commit
59b0ccc
·
verified ·
1 Parent(s): ba61207

Add model card

Browse files
Files changed (1) hide show
  1. README.md +98 -0
README.md ADDED
@@ -0,0 +1,98 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ tags:
4
+ - mlx
5
+ - audio
6
+ - pitch-estimation
7
+ - f0
8
+ - voice-conversion
9
+ - rvc
10
+ - apple-silicon
11
+ library_name: mlx
12
+ pipeline_tag: audio-to-audio
13
+ ---
14
+
15
+ # MLX-RMVPE
16
+
17
+ MLX implementation of [RMVPE](https://arxiv.org/abs/2306.15412) (Robust Model for Vocal Pitch Estimation) for Apple Silicon.
18
+
19
+ ## Model Description
20
+
21
+ RMVPE extracts **fundamental frequency (F0)** from audio, essential for preserving pitch/melody in voice conversion. Unlike simpler methods (CREPE, pYIN), RMVPE is specifically designed for **polyphonic music**, making it ideal for singing voice conversion where background music may be present.
22
+
23
+ - **Architecture**: Deep U-Net with BiGRU layers
24
+ - **Parameters**: ~15.4M
25
+ - **Input**: 16kHz audio
26
+ - **Output**: F0 in Hz at 100fps (hop_length=160)
27
+ - **Pitch range**: ~32 Hz to ~1975 Hz (360 bins)
28
+
29
+ ## Usage
30
+
31
+ ```bash
32
+ pip install mlx-rmvpe
33
+ ```
34
+
35
+ ```python
36
+ import librosa
37
+ from mlx_rmvpe import RMVPE
38
+
39
+ # Load model (auto-downloads weights)
40
+ model = RMVPE.from_pretrained()
41
+
42
+ # Load audio at 16kHz
43
+ audio, sr = librosa.load("singing.wav", sr=16000, mono=True)
44
+
45
+ # Extract F0
46
+ f0 = model.infer_from_audio(audio)
47
+
48
+ print(f"F0 shape: {f0.shape} at 100fps")
49
+ print(f"Pitch range: {f0[f0 > 0].min():.1f} - {f0[f0 > 0].max():.1f} Hz")
50
+ ```
51
+
52
+ ## Manual Loading
53
+
54
+ ```python
55
+ from huggingface_hub import hf_hub_download
56
+ from mlx_rmvpe import RMVPE
57
+
58
+ weights_path = hf_hub_download(
59
+ repo_id="lexandstuff/mlx-rmvpe",
60
+ filename="rmvpe.safetensors"
61
+ )
62
+
63
+ model = RMVPE()
64
+ model.load_weights(weights_path)
65
+ model.eval()
66
+ ```
67
+
68
+ ## Technical Details
69
+
70
+ This implementation is converted from the PyTorch weights and produces numerically similar outputs:
71
+
72
+ | Metric | Value |
73
+ |--------|-------|
74
+ | Mean F0 difference | 1.29 Hz |
75
+ | Correlation | >0.99 |
76
+
77
+ See the [GitHub repository](https://github.com/lexandstuff/mlx-rmvpe) for implementation details and the full API reference.
78
+
79
+ ## Citation
80
+
81
+ ```bibtex
82
+ @inproceedings{wei2023rmvpe,
83
+ title={RMVPE: A Robust Model for Vocal Pitch Estimation in Polyphonic Music},
84
+ author={Wei, Yongmao and others},
85
+ booktitle={ISMIR},
86
+ year={2023}
87
+ }
88
+ ```
89
+
90
+ ## License
91
+
92
+ MIT
93
+
94
+ ## Acknowledgments
95
+
96
+ - [RMVPE](https://github.com/Dream-High/RMVPE) - Original implementation
97
+ - [RVC](https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI) - Voice conversion pipeline
98
+ - [MLX](https://github.com/ml-explore/mlx) - Apple's machine learning framework