menghao commited on
Commit
adb817e
·
1 Parent(s): 1218673

Update README.md based on nielsr's suggestions

Browse files
Files changed (1) hide show
  1. README.md +58 -23
README.md CHANGED
@@ -1,71 +1,106 @@
1
  ---
2
- license: apache-2.0
3
  language:
4
  - en
5
  - zh
 
 
 
6
  tags:
7
  - text-to-audio
8
  - music
9
  - singing-voice-synthesis
10
  - svs
11
- library_name: huggingface_hub
12
- pipeline_tag: text-to-speech
13
  ---
14
 
15
  <div align="center">
16
- <h1>
17
- SoulX-Singer
18
- </h1>
19
- <p>
20
- <br>
21
  <b><em> Towards High-Quality Zero-Shot Singing Voice Synthesis</em></b>
22
  </p>
23
  <p>
24
- <img src="assets/soulx-logo.png" alt="SoulX-Podcast_Logo" style="height: 80px;">
25
  </p>
26
  <p>
27
  </p>
28
  <a href="https://soul-ailab.github.io/soulx-singer/"><img src="https://img.shields.io/badge/Demo-Page-lightgrey" alt="version"></a>
29
  <a href="https://github.com/Soul-AILab/SoulX-Singer"><img src='https://img.shields.io/badge/Github-Page-green' alt="Github"></a>
 
30
  <a href="https://github.com/Soul-AILab/SoulX-Singer/blob/main/assets/technical-report.pdf"><img src='https://img.shields.io/badge/Report-Github?label=Technical&color=red' alt="technical report"></a>
31
  <a href="https://github.com/Soul-AILab/SoulX-Singer"><img src="https://img.shields.io/badge/License-Apache%202.0-blue.svg" alt="Apache-2.0"></a>
32
  </div>
33
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
34
 
35
  ## License
36
 
37
  We use the Apache 2.0 license. Researchers and developers are free to use the codes and model weights of our SoulX-Singer. Check the license at [LICENSE](LICENSE) for more details.
38
 
39
 
40
- ## Usage Disclaimer
41
  This project provides a singing voice synthesis model for vocal generation capable of zero-shot voice cloning, intended for academic research, educational purposes, and legitimate applications, such as personalized vocal synthesis and assistive technologies.
42
 
43
  Please note:
 
44
 
45
- Users of SoulX-Singer are strongly encouraged to respect intellectual property, privacy, and personal consent when generating singing content. The system should not be used to impersonate individuals without authorization, nor to produce deceptive or misleading audio content.
46
-
47
- The developers assume no liability for any misuse of this model.
48
 
49
- We advocate for the responsible development and use of AI and encourage the community to uphold safety and ethical principles in AI research and applications. If you have any concerns regarding ethics or misuse, please contact us.
50
 
 
 
 
 
 
 
 
 
 
 
 
51
 
52
  ## Contact Us
53
  If you are interested in leaving a message to our work, feel free to email qianjiale@soulapp.cn or menghao@soulapp.cn or wangxinsheng@soulapp.cn
54
 
55
  You’re welcome to join our WeChat or Soul APP group for technical discussions, updates.
56
  <p align="center">
57
- <!-- <em>Due to group limits, if you can't scan the QR code, please add my WeChat for group access -->
58
- <!-- : <strong>Tiamo James</strong></em> -->
59
  <br>
60
  <span style="display: inline-block; margin-right: 10px;">
61
  <img src="assets/soul_wechat01.jpg" width="500" alt="WeChat Group QR Code"/>
62
  </span>
63
- <!-- <span style="display: inline-block;">
64
- <img src="assets/wechat_tiamo.jpg" width="300" alt="WeChat QR Code"/>
65
- </span> -->
66
  </p>
67
 
68
- <!-- <p align="center">
69
- <img src="src/figs/npu@aslp.jpeg" width="500"/>
70
- </p -->
71
- <!-- <img src="assets/wechat.jpg -->
 
1
  ---
2
+
3
  language:
4
  - en
5
  - zh
6
+ library_name: huggingface_hub
7
+ license: apache-2.0
8
+ pipeline_tag: text-to-speech
9
  tags:
10
  - text-to-audio
11
  - music
12
  - singing-voice-synthesis
13
  - svs
14
+ - zero-shot
15
+
16
  ---
17
 
18
  <div align="center">
 
 
 
 
 
19
  <b><em> Towards High-Quality Zero-Shot Singing Voice Synthesis</em></b>
20
  </p>
21
  <p>
22
+ <img src="assets/soulx-logo.png" alt="SoulX-Singer_Logo" style="height: 80px;">
23
  </p>
24
  <p>
25
  </p>
26
  <a href="https://soul-ailab.github.io/soulx-singer/"><img src="https://img.shields.io/badge/Demo-Page-lightgrey" alt="version"></a>
27
  <a href="https://github.com/Soul-AILab/SoulX-Singer"><img src='https://img.shields.io/badge/Github-Page-green' alt="Github"></a>
28
+ <a href="https://arxiv.org/abs/2602.07803"><img src="https://img.shields.io/badge/arXiv-2602.07803-b31b1b" alt="arXiv"></a>
29
  <a href="https://github.com/Soul-AILab/SoulX-Singer/blob/main/assets/technical-report.pdf"><img src='https://img.shields.io/badge/Report-Github?label=Technical&color=red' alt="technical report"></a>
30
  <a href="https://github.com/Soul-AILab/SoulX-Singer"><img src="https://img.shields.io/badge/License-Apache%202.0-blue.svg" alt="Apache-2.0"></a>
31
  </div>
32
 
33
+ **SoulX-Singer** is a high-fidelity, zero-shot singing voice synthesis model that enables users to generate realistic singing voices for unseen singers. It supports melody-conditioned (F0 contour) and score-conditioned (MIDI notes) control for precise pitch, rhythm, and expression.
34
+
35
+ For more details, please refer to the paper: [SoulX-Singer: Towards High-Quality Zero-Shot Singing Voice Synthesis](https://arxiv.org/abs/2602.07803).
36
+
37
+ ## Sample Usage
38
+
39
+ ### 1. Set Up Environment
40
+
41
+ ```bash
42
+ git clone https://github.com/Soul-AILab/SoulX-Singer.git
43
+ cd SoulX-Singer
44
+ conda create -n soulxsinger -y python=3.10
45
+ conda activate soulxsinger
46
+ pip install -r requirements.txt
47
+ ```
48
+
49
+ ### 2. Download Pretrained Models
50
+
51
+ ```bash
52
+ pip install -U huggingface_hub
53
+
54
+ # Download the SoulX-Singer SVS model
55
+ hf download Soul-AILab/SoulX-Singer --local-dir pretrained_models/SoulX-Singer
56
+
57
+ # Download models required for preprocessing
58
+ hf download Soul-AILab/SoulX-Singer-Preprocess --local-dir pretrained_models/SoulX-Singer-Preprocess
59
+ ```
60
+
61
+ ### 3. Run Inference
62
+
63
+ ```bash
64
+ bash example/infer.sh
65
+ ```
66
 
67
  ## License
68
 
69
  We use the Apache 2.0 license. Researchers and developers are free to use the codes and model weights of our SoulX-Singer. Check the license at [LICENSE](LICENSE) for more details.
70
 
71
 
72
+ ## Usage Disclaimer
73
  This project provides a singing voice synthesis model for vocal generation capable of zero-shot voice cloning, intended for academic research, educational purposes, and legitimate applications, such as personalized vocal synthesis and assistive technologies.
74
 
75
  Please note:
76
+ We advocate for the responsible development and use of AI and encourage the community to uphold safety and ethical principles in AI research and applications. If you have any concerns regarding ethics or misuse, please contact us.
77
 
 
 
 
78
 
79
+ ## Citation
80
 
81
+ ```bibtex
82
+ @misc{soulxsinger,
83
+ title={SoulX-Singer: Towards High-Quality Zero-Shot Singing Voice Synthesis},
84
+ author={Jiale Qian and Hao Meng and Tian Zheng and Pengcheng Zhu and Haopeng Lin and Yuhang Dai and Hanke Xie and Wenxiao Cao and Ruixuan Shang and Jun Wu and Hongmei Liu and Hanlin Wen and Jian Zhao and Zhonglin Jiang and Yong Chen and Shunshun Yin and Ming Tao and Jianguo Wei and Lei Xie and Xinsheng Wang},
85
+ year={2026},
86
+ eprint={2602.07803},
87
+ archivePrefix={arXiv},
88
+ primaryClass={eess.AS},
89
+ url={https://arxiv.org/abs/2602.07803},
90
+ }
91
+ ```
92
 
93
  ## Contact Us
94
  If you are interested in leaving a message to our work, feel free to email qianjiale@soulapp.cn or menghao@soulapp.cn or wangxinsheng@soulapp.cn
95
 
96
  You’re welcome to join our WeChat or Soul APP group for technical discussions, updates.
97
  <p align="center">
98
+
99
+
100
  <br>
101
  <span style="display: inline-block; margin-right: 10px;">
102
  <img src="assets/soul_wechat01.jpg" width="500" alt="WeChat Group QR Code"/>
103
  </span>
 
 
 
104
  </p>
105
 
106
+