Update model card with pipeline tag, paper link, and sample usage

#2
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +55 -16
README.md CHANGED
@@ -1,15 +1,16 @@
1
  ---
2
- license: apache-2.0
3
  language:
4
  - en
5
  - zh
 
 
 
6
  tags:
7
  - text-to-audio
8
  - music
9
  - singing-voice-synthesis
10
  - svs
11
- library_name: huggingface_hub
12
- pipeline_tag: text-to-audio
13
  ---
14
 
15
  <div align="center">
@@ -21,23 +22,57 @@ pipeline_tag: text-to-audio
21
  <b><em> Towards High-Quality Zero-Shot Singing Voice Synthesis</em></b>
22
  </p>
23
  <p>
24
- <img src="assets/soulx-logo.png" alt="SoulX-Podcast_Logo" style="height: 80px;">
25
  </p>
26
  <p>
27
  </p>
28
  <a href="https://soul-ailab.github.io/soulx-singer/"><img src="https://img.shields.io/badge/Demo-Page-lightgrey" alt="version"></a>
29
  <a href="https://github.com/Soul-AILab/SoulX-Singer"><img src='https://img.shields.io/badge/Github-Page-green' alt="Github"></a>
 
30
  <a href="https://github.com/Soul-AILab/SoulX-Singer/blob/main/assets/technical-report.pdf"><img src='https://img.shields.io/badge/Report-Github?label=Technical&color=red' alt="technical report"></a>
31
  <a href="https://github.com/Soul-AILab/SoulX-Singer"><img src="https://img.shields.io/badge/License-Apache%202.0-blue.svg" alt="Apache-2.0"></a>
32
  </div>
33
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
34
 
35
  ## License
36
 
37
  We use the Apache 2.0 license. Researchers and developers are free to use the codes and model weights of our SoulX-Singer. Check the license at [LICENSE](LICENSE) for more details.
38
 
39
 
40
- ## Usage Disclaimer
41
  This project provides a singing voice synthesis model for vocal generation capable of zero-shot voice cloning, intended for academic research, educational purposes, and legitimate applications, such as personalized vocal synthesis and assistive technologies.
42
 
43
  Please note:
@@ -49,23 +84,27 @@ The developers assume no liability for any misuse of this model.
49
  We advocate for the responsible development and use of AI and encourage the community to uphold safety and ethical principles in AI research and applications. If you have any concerns regarding ethics or misuse, please contact us.
50
 
51
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
52
  ## Contact Us
53
  If you are interested in leaving a message to our work, feel free to email qianjiale@soulapp.cn or menghao@soulapp.cn or wangxinsheng@soulapp.cn
54
 
55
  You’re welcome to join our WeChat or Soul APP group for technical discussions, updates.
56
  <p align="center">
57
- <!-- <em>Due to group limits, if you can't scan the QR code, please add my WeChat for group access -->
58
- <!-- : <strong>Tiamo James</strong></em> -->
59
  <br>
60
  <span style="display: inline-block; margin-right: 10px;">
61
  <img src="assets/soul_wechat01.jpg" width="500" alt="WeChat Group QR Code"/>
62
  </span>
63
- <!-- <span style="display: inline-block;">
64
- <img src="assets/wechat_tiamo.jpg" width="300" alt="WeChat QR Code"/>
65
- </span> -->
66
- </p>
67
-
68
- <!-- <p align="center">
69
- <img src="src/figs/npu@aslp.jpeg" width="500"/>
70
- </p -->
71
- <!-- <img src="assets/wechat.jpg -->
 
1
  ---
 
2
  language:
3
  - en
4
  - zh
5
+ library_name: huggingface_hub
6
+ license: apache-2.0
7
+ pipeline_tag: text-to-speech
8
  tags:
9
  - text-to-audio
10
  - music
11
  - singing-voice-synthesis
12
  - svs
13
+ - zero-shot
 
14
  ---
15
 
16
  <div align="center">
 
22
  <b><em> Towards High-Quality Zero-Shot Singing Voice Synthesis</em></b>
23
  </p>
24
  <p>
25
+ <img src="assets/soulx-logo.png" alt="SoulX-Singer_Logo" style="height: 80px;">
26
  </p>
27
  <p>
28
  </p>
29
  <a href="https://soul-ailab.github.io/soulx-singer/"><img src="https://img.shields.io/badge/Demo-Page-lightgrey" alt="version"></a>
30
  <a href="https://github.com/Soul-AILab/SoulX-Singer"><img src='https://img.shields.io/badge/Github-Page-green' alt="Github"></a>
31
+ <a href="https://arxiv.org/abs/2602.07803"><img src="https://img.shields.io/badge/arXiv-2602.07803-b31b1b" alt="arXiv"></a>
32
  <a href="https://github.com/Soul-AILab/SoulX-Singer/blob/main/assets/technical-report.pdf"><img src='https://img.shields.io/badge/Report-Github?label=Technical&color=red' alt="technical report"></a>
33
  <a href="https://github.com/Soul-AILab/SoulX-Singer"><img src="https://img.shields.io/badge/License-Apache%202.0-blue.svg" alt="Apache-2.0"></a>
34
  </div>
35
 
36
+ **SoulX-Singer** is a high-fidelity, zero-shot singing voice synthesis model that enables users to generate realistic singing voices for unseen singers. It supports melody-conditioned (F0 contour) and score-conditioned (MIDI notes) control for precise pitch, rhythm, and expression.
37
+
38
+ For more details, please refer to the paper: [SoulX-Singer: Towards High-Quality Zero-Shot Singing Voice Synthesis](https://arxiv.org/abs/2602.07803).
39
+
40
+ ## Sample Usage
41
+
42
+ ### 1. Set Up Environment
43
+
44
+ ```bash
45
+ git clone https://github.com/Soul-AILab/SoulX-Singer.git
46
+ cd SoulX-Singer
47
+ conda create -n soulxsinger -y python=3.10
48
+ conda activate soulxsinger
49
+ pip install -r requirements.txt
50
+ ```
51
+
52
+ ### 2. Download Pretrained Models
53
+
54
+ ```bash
55
+ pip install -U huggingface_hub
56
+
57
+ # Download the SoulX-Singer SVS model
58
+ hf download Soul-AILab/SoulX-Singer --local-dir pretrained_models/SoulX-Singer
59
+
60
+ # Download models required for preprocessing
61
+ hf download Soul-AILab/SoulX-Singer-Preprocess --local-dir pretrained_models/SoulX-Singer-Preprocess
62
+ ```
63
+
64
+ ### 3. Run Inference
65
+
66
+ ```bash
67
+ bash example/infer.sh
68
+ ```
69
 
70
  ## License
71
 
72
  We use the Apache 2.0 license. Researchers and developers are free to use the codes and model weights of our SoulX-Singer. Check the license at [LICENSE](LICENSE) for more details.
73
 
74
 
75
+ ## Usage Disclaimer
76
  This project provides a singing voice synthesis model for vocal generation capable of zero-shot voice cloning, intended for academic research, educational purposes, and legitimate applications, such as personalized vocal synthesis and assistive technologies.
77
 
78
  Please note:
 
84
  We advocate for the responsible development and use of AI and encourage the community to uphold safety and ethical principles in AI research and applications. If you have any concerns regarding ethics or misuse, please contact us.
85
 
86
 
87
+ ## Citation
88
+
89
+ ```bibtex
90
+ @misc{soulxsinger,
91
+ title={SoulX-Singer: Towards High-Quality Zero-Shot Singing Voice Synthesis},
92
+ author={Jiale Qian and Hao Meng and Tian Zheng and Pengcheng Zhu and Haopeng Lin and Yuhang Dai and Hanke Xie and Wenxiao Cao and Ruixuan Shang and Jun Wu and Hongmei Liu and Hanlin Wen and Jian Zhao and Zhonglin Jiang and Yong Chen and Shunshun Yin and Ming Tao and Jianguo Wei and Lei Xie and Xinsheng Wang},
93
+ year={2026},
94
+ eprint={2602.07803},
95
+ archivePrefix={arXiv},
96
+ primaryClass={eess.AS},
97
+ url={https://arxiv.org/abs/2602.07803},
98
+ }
99
+ ```
100
+
101
  ## Contact Us
102
  If you are interested in leaving a message to our work, feel free to email qianjiale@soulapp.cn or menghao@soulapp.cn or wangxinsheng@soulapp.cn
103
 
104
  You’re welcome to join our WeChat or Soul APP group for technical discussions, updates.
105
  <p align="center">
 
 
106
  <br>
107
  <span style="display: inline-block; margin-right: 10px;">
108
  <img src="assets/soul_wechat01.jpg" width="500" alt="WeChat Group QR Code"/>
109
  </span>
110
+ </p>