ailuntz commited on
Commit
eef7283
Β·
verified Β·
1 Parent(s): d9aaea5

Add upstream_README.md

Browse files
Files changed (1) hide show
  1. upstream_README.md +256 -0
upstream_README.md ADDED
@@ -0,0 +1,256 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <div align="center">
2
+ <h1>🎀 SoulX-Singer</h1>
3
+ <p>
4
+ Official inference code for<br>
5
+ <b><em>SoulX-Singer: Towards High-Quality Zero-Shot Singing Voice Synthesis</em></b>
6
+ </p>
7
+ <p>
8
+ <img src="assets/soulx-logo.png" alt="SoulX-Logo" style="height:85px;">
9
+ </p>
10
+ <p>
11
+ <a href="https://soul-ailab.github.io/soulx-singer/"><img src="https://img.shields.io/badge/Demo-Page-lightgrey" alt="Demo Page"></a>
12
+ <a href="https://huggingface.co/spaces/Soul-AILab/SoulX-Singer"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20HF%20Space-Online%20Demo-ffda16" alt="HF Space Demo"></a>
13
+ <a href="https://huggingface.co/Soul-AILab/SoulX-Singer"><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model-blue' alt="HF-model"></a>
14
+ <a href="assets/technical-report.pdf"><img src="https://img.shields.io/badge/Report-Github-red" alt="Technical Report"></a>
15
+ <a href="https://arxiv.org/abs/2602.07803"><img src="https://img.shields.io/badge/arXiv-2602.07803-b31b1b" alt="arXiv"></a>
16
+ <a href="https://github.com/Soul-AILab/SoulX-Singer"><img src="https://img.shields.io/badge/License-Apache%202.0-blue" alt="License"></a>
17
+ </p>
18
+ </div>
19
+
20
+ ---
21
+
22
+ ## 🎡 Overview
23
+
24
+ **SoulX-Singer** is a high-fidelity, zero-shot singing voice synthesis model that enables users to generate realistic singing voices for unseen singers. It supports **melody-conditioned (F0 contour)** and **score-conditioned (MIDI notes)** control for precise pitch, rhythm, and expression.
25
+
26
+ **SoulX-Singer-SVC** is a singing voice conversion (SVC) model finetuned from **SoulX-Singer**. Singing Voice Conversion aims to transform a source singing recording into the target singer’s voice while preserving the original melody, rhythm, and lyrical content. Based on the strong generative capability of SoulX-Singer, SoulX-Singer-SVC enables high-quality singing voice conversion directly from raw singing audio, without requiring lyric or MIDI transcriptions.
27
+
28
+ ---
29
+
30
+ ## ✨ Key Features
31
+
32
+ #### SoulX-Singer
33
+ - **🎀 Zero-Shot Singing** – Generate high-fidelity voices for unseen singers, no fine-tuning needed.
34
+ - **🎡 Flexible Control Modes** – Melody (F0) and Score (MIDI) conditioning.
35
+ - **πŸ“š Large-Scale Dataset** – 42,000+ hours of aligned vocals, lyrics, notes across Mandarin, English, Cantonese.
36
+ - **πŸ§‘β€πŸŽ€ Timbre Cloning** – Preserve singer identity across languages, styles, and edited lyrics.
37
+ - **✏️ Singing Voice Editing** – Modify lyrics while keeping natural prosody.
38
+ - **🌐 Cross-Lingual Synthesis** – High-fidelity synthesis by disentangling timbre from content.
39
+
40
+ #### SoulX-Singer-SVC
41
+ - **πŸŽ™οΈ Zero-Shot Timbre and Style Transfer** – Transfer singer identity and style to unseen voices without per-speaker fine-tuning.
42
+ - **🌍 Language-Agnostic Conversion** – Works across multilingual singing content.
43
+ - **πŸ”„ Transcription-Free Audio-to-Audio Conversion** – Convert target singing directly without lyrics transcription or MIDI inputs.
44
+
45
+ ---
46
+
47
+ <p align="center">
48
+ <img src="assets/performance_radar.png" width="80%" alt="Performance Radar"/>
49
+ </p>
50
+
51
+ ---
52
+
53
+ ## 🎬 Demo Examples
54
+
55
+ ### Singing Voice Synthesis (SVS)
56
+ <div align="center">
57
+
58
+ <https://github.com/user-attachments/assets/13306f10-3a29-46ba-bcef-d6308d05cbcc>
59
+
60
+ </div>
61
+ <div align="center">
62
+
63
+ <https://github.com/user-attachments/assets/2eb260fe-6f0b-408c-aab8-5b81ddddb284>
64
+
65
+ </div>
66
+
67
+ ### Singing Voice Conversion (SVC)
68
+ <div align="center">
69
+
70
+ <https://github.com/user-attachments/assets/aed15fc9-14c3-44fc-9146-f6d9fef894d3>
71
+
72
+ </div>
73
+
74
+ ---
75
+
76
+ ## πŸ“° News
77
+ - **[2026-03-16]** [SoulX-Singer-SVC](https://huggingface.co/Soul-AILab/SoulX-Singer/blob/main/model-svc.pt) is released, and [SoulX-Singer Online Demo](https://huggingface.co/spaces/Soul-AILab/SoulX-Singer) has been updated to support singing voice conversion (SVC).
78
+ - **[2026-02-12]** [SoulX-Singer Eval Dataset](https://huggingface.co/datasets/Soul-AILab/SoulX-Singer-Eval-Dataset) is now available on Hugging Face Datasets.
79
+ - **[2026-02-09]** [SoulX-Singer Online Demo](https://huggingface.co/spaces/Soul-AILab/SoulX-Singer) is live on Hugging Face Spaces β€” try singing voice synthesis in your browser.
80
+ - **[2026-02-08]** [MIDI Editor](https://huggingface.co/spaces/Soul-AILab/SoulX-Singer-Midi-Editor) is available on Hugging Face Spaces.
81
+ - **[2026-02-06]** SoulX-Singer inference code and models released.
82
+
83
+ ---
84
+
85
+ ## πŸš€ Quick Start
86
+
87
+ ### 1. Clone Repository
88
+
89
+ ```bash
90
+ git clone https://github.com/Soul-AILab/SoulX-Singer.git
91
+ cd SoulX-Singer
92
+ ```
93
+
94
+ ### 2. Set Up Environment
95
+
96
+ **1. Install Conda** (if not already installed): https://docs.conda.io/en/latest/miniconda.html
97
+
98
+ **2. Create and activate a Conda environment:**
99
+ ```
100
+ conda create -n soulxsinger -y python=3.10
101
+ conda activate soulxsinger
102
+ ```
103
+ **3. Install dependencies:**
104
+ ```
105
+ pip install -r requirements.txt
106
+ ```
107
+ ⚠️ If you are in mainland China, use a PyPI mirror:
108
+ ```
109
+ pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/ --trusted-host=mirrors.aliyun.com
110
+ ```
111
+
112
+
113
+ ---
114
+
115
+ ### 3. Download Pretrained Models
116
+
117
+ Install Hugging Face Hub if needed:
118
+
119
+ ```
120
+ pip install -U huggingface_hub
121
+ ```
122
+
123
+ Download the SVS, SVC model and preprocessing models:
124
+ ```sh
125
+ pip install -U huggingface_hub
126
+
127
+ # Download the SoulX-Singer SVS and SVC model
128
+ hf download Soul-AILab/SoulX-Singer --local-dir pretrained_models/SoulX-Singer
129
+
130
+ # Download models required for preprocessing
131
+ hf download Soul-AILab/SoulX-Singer-Preprocess --local-dir pretrained_models/SoulX-Singer-Preprocess
132
+ ```
133
+
134
+
135
+ ### 4. Run the Demo
136
+
137
+ #### Run the SVS inference demo
138
+ ``` sh
139
+ bash example/infer.sh
140
+ ```
141
+
142
+ This script relies on metadata generated from the preprocessing pipeline, including vocal separation and transcription. Users should follow the steps in [preprocess](preprocess/README.md) to prepare the necessary metadata before running the demo with their own data.
143
+
144
+ **⚠️ Important Note**
145
+ The metadata produced by the automatic preprocessing pipeline may not perfectly align the singing audio with the corresponding lyrics and musical notes. For best synthesis quality, we strongly recommend manually correcting the alignment using the 🎼 [Midi-Editor](https://huggingface.co/spaces/Soul-AILab/SoulX-Singer-Midi-Editor).
146
+
147
+ How to use the Midi-Editor:
148
+ - [Eiditing Metadata with Midi-Editor](preprocess/README.md#L104-L105)
149
+
150
+ #### Run the SVC inference demo
151
+
152
+ ```sh
153
+ bash example/infer_svc.sh
154
+ ```
155
+
156
+ This example performs audio-to-audio SVC, converting the target singing into the prompt timbre using waveform and F0 inputs.
157
+ To prepare your own SVC data, run `example/preprocess.sh` with `midi_transcribe=False`.
158
+
159
+
160
+
161
+ ### 🌐 WebUI
162
+
163
+ You can launch the interactive interface for SVS (Synthesised from lyrics and MIDI transcriptions) with:
164
+ ```
165
+ python webui.py
166
+ ```
167
+
168
+ For SVC WebUI (audio-to-audio conversion):
169
+
170
+ ```
171
+ python webui_svc.py
172
+ ```
173
+
174
+
175
+
176
+ ## 🚧 Roadmap
177
+
178
+ - [x] πŸ–₯️ Web-based UI for easy and interactive inference
179
+ - [x] 🌐 Online MIDI Editor deployment on Hugging Face Spaces
180
+ - [x] 🌐 Online demo deployment on Hugging Face Spaces
181
+ - [x] πŸ“Š Release the SoulX-Singer-Eval benchmark
182
+ - [ ] 🎹 Inference support for user-friendly MIDI-based input
183
+ - [ ] πŸ“š Comprehensive tutorials and usage documentation
184
+ - [x] 🎡 Support for wav-to-wav singing voice conversion (without transcription)
185
+
186
+
187
+ ## πŸ™ Acknowledgements
188
+
189
+ Special thanks to the following open-source projects:
190
+
191
+ - [F5-TTS](https://github.com/SWivid/F5-TTS)
192
+ - [Amphion](https://github.com/open-mmlab/Amphion/tree/main)
193
+ - [Music Source Separation Training](https://github.com/ZFTurbo/Music-Source-Separation-Training)
194
+ - [Lead Vocal Separation](https://huggingface.co/becruily/mel-band-roformer-karaoke)
195
+ - [Vocal Dereverberation](https://huggingface.co/anvuew/dereverb_mel_band_roformer)
196
+ - [RMVPE](https://github.com/Dream-High/RMVPE)
197
+ [Paraformer](https://modelscope.cn/models/iic/speech_seaco_paraformer_large_asr_nat-zh-cn-16k-common-vocab8404-pytorch)
198
+ - [Parakeet-tdt-0.6b-v2](https://huggingface.co/nvidia/parakeet-tdt-0.6b-v2)
199
+ - [ROSVOT](https://github.com/RickyL-2000/ROSVOT)
200
+
201
+
202
+
203
+ ## πŸ“„ License
204
+
205
+ We use the Apache 2.0 license. Researchers and developers are free to use the codes and model weights of our SoulX-Singer. Check the license at [LICENSE](LICENSE) for more details.
206
+
207
+
208
+ ## ⚠️ Usage Disclaimer
209
+
210
+ SoulX-Singer is intended for academic research, educational purposes, and legitimate applications such as personalized singing synthesis and assistive technologies.
211
+
212
+ Please note:
213
+
214
+ - 🎀 Respect intellectual property, privacy, and personal consent when generating singing content.
215
+ - 🚫 Do not use the model to impersonate individuals without authorization or to create deceptive audio.
216
+ - ⚠️ The developers assume no liability for any misuse of this model.
217
+
218
+ We advocate for the responsible development and use of AI and encourage the community to uphold safety and ethical principles. For ethics or misuse concerns, please contact us.
219
+
220
+
221
+ ## πŸ“„ Citation
222
+
223
+ If you use SoulX-Singer in your research, please cite:
224
+
225
+ ```bibtex
226
+ @misc{soulxsinger,
227
+ title={SoulX-Singer: Towards High-Quality Zero-Shot Singing Voice Synthesis},
228
+ author={Jiale Qian and Hao Meng and Tian Zheng and Pengcheng Zhu and Haopeng Lin and Yuhang Dai and Hanke Xie and Wenxiao Cao and Ruixuan Shang and Jun Wu and Hongmei Liu and Hanlin Wen and Jian Zhao and Zhonglin Jiang and Yong Chen and Shunshun Yin and Ming Tao and Jianguo Wei and Lei Xie and Xinsheng Wang},
229
+ year={2026},
230
+ eprint={2602.07803},
231
+ archivePrefix={arXiv},
232
+ primaryClass={eess.AS},
233
+ url={https://arxiv.org/abs/2602.07803},
234
+ }
235
+ ```
236
+
237
+
238
+ ## πŸ“¬ Contact Us
239
+
240
+ We welcome your feedback, questions, and collaboration:
241
+
242
+ - **Email**: qianjiale@soulapp.cn | menghao@soulapp.cn | wangxinsheng@soulapp.cn
243
+
244
+ - **Join discussions**: WeChat or Soul APP groups for technical discussions and updates:
245
+
246
+ <p align="center">
247
+ <!-- <em>Due to group limits, if you can't scan the QR code, please add my WeChat for group access -->
248
+ <!-- : <strong>Tiamo James</strong></em> -->
249
+ <br>
250
+ <span style="display: inline-block; margin-right: 10px;">
251
+ <img src="assets/soul_wechat01.jpg" width="500" alt="WeChat Group QR Code"/>
252
+ </span>
253
+ <!-- <span style="display: inline-block;">
254
+ <img src="assets/wechat_tiamo.jpg" width="300" alt="WeChat QR Code"/>
255
+ </span> -->
256
+ </p>