Soul-AILab
/

SoulX-Singer-Preprocess

NeMo

Model card Files Files and versions

xet

Community

Add model card for SoulX-Singer

by nielsr HF Staff - opened Feb 11

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

+44

-3

Files changed (1) hide show

README.md +44 -3

README.md CHANGED Viewed

@@ -1,3 +1,44 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+pipeline_tag: audio-to-audio
+---
+# SoulX-Singer: Towards High-Quality Zero-Shot Singing Voice Synthesis
+SoulX-Singer is a high-fidelity, zero-shot singing voice synthesis (SVS) model that enables the generation of realistic singing voices for unseen singers. It supports controllable singing generation conditioned on either symbolic musical scores (MIDI) or melodic representations (F0 contour), enabling flexible and expressive control in real-world production workflows.
+[**Project Page**](https://soul-ailab.github.io/soulx-singer) | [**Paper (arXiv)**](https://arxiv.org/abs/2602.07803) | [**GitHub**](https://github.com/Soul-AILab/SoulX-Singer) | [**Online Demo**](https://huggingface.co/spaces/Soul-AILab/SoulX-Singer)
+## Key Features
+- **🎤 Zero-Shot Singing**: Generate high-fidelity voices for unseen singers without any fine-tuning.
+- **🎵 Flexible Control Modes**: Supports both Melody-conditioned (F0 contour) and Score-conditioned (MIDI notes) control.
+- **📚 Large-Scale Dataset**: Trained on over 42,000 hours of aligned vocals, lyrics, and notes across Mandarin, English, and Cantonese.
+- **🧑‍🎤 Timbre Cloning**: Preserves singer identity across different languages, styles, and edited lyrics.
+- **✏️ Singing Voice Editing**: Allows for modifying lyrics while maintaining natural prosody.
+- **🌐 Cross-Lingual Synthesis**: Disentangles timbre from content to ensure high-fidelity synthesis across supported languages.
+## Usage
+For detailed installation and inference instructions, please refer to the [official GitHub repository](https://github.com/Soul-AILab/SoulX-Singer).
+The repository provides a `webui.py` for an interactive interface and example scripts for batch inference.
+## Citation
+If you use SoulX-Singer in your research, please cite the following:
+```bibtex
+@misc{soulxsinger,
+      title={SoulX-Singer: Towards High-Quality Zero-Shot Singing Voice Synthesis},
+      author={Jiale Qian and Hao Meng and Tian Zheng and Pengcheng Zhu and Haopeng Lin and Yuhang Dai and Hanke Xie and Wenxiao Cao and Ruixuan Shang and Jun Wu and Hongmei Liu and Hanlin Wen and Jian Zhao and Zhonglin Jiang and Yong Chen and Shunshun Yin and Ming Tao and Jianguo Wei and Lei Xie and Xinsheng Wang},
+      year={2026},
+      eprint={2602.07803},
+      archivePrefix={arXiv},
+      primaryClass={eess.AS},
+      url={https://arxiv.org/abs/2602.07803},
+}
+```
+---
+*SoulX-Singer is intended for academic research and legitimate applications. Please respect intellectual property and privacy when generating content.*