williamchangtw Xinsheng-Wang commited on
Commit
b645b2d
·
0 Parent(s):

Duplicate from Soul-AILab/SoulX-Singer

Browse files

Co-authored-by: Xinsheng Wang <Xinsheng-Wang@users.noreply.huggingface.co>

Files changed (6) hide show
  1. .gitattributes +38 -0
  2. README.md +106 -0
  3. assets/soul_wechat01.jpg +3 -0
  4. assets/soulx-logo.png +3 -0
  5. config.yaml +37 -0
  6. model.pt +3 -0
.gitattributes ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ assets/logo_fixed.png filter=lfs diff=lfs merge=lfs -text
37
+ assets/soul_wechat01.jpg filter=lfs diff=lfs merge=lfs -text
38
+ assets/soulx-logo.png filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,106 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+
3
+ language:
4
+ - en
5
+ - zh
6
+ library_name: huggingface_hub
7
+ license: apache-2.0
8
+ pipeline_tag: text-to-speech
9
+ tags:
10
+ - text-to-audio
11
+ - music
12
+ - singing-voice-synthesis
13
+ - svs
14
+ - zero-shot
15
+
16
+ ---
17
+
18
+ <div align="center">
19
+ <b><em> Towards High-Quality Zero-Shot Singing Voice Synthesis</em></b>
20
+ </p>
21
+ <p>
22
+ <img src="assets/soulx-logo.png" alt="SoulX-Singer_Logo" style="height: 80px;">
23
+ </p>
24
+ <p>
25
+ </p>
26
+ <a href="https://soul-ailab.github.io/soulx-singer/"><img src="https://img.shields.io/badge/Demo-Page-lightgrey" alt="version"></a>
27
+ <a href="https://github.com/Soul-AILab/SoulX-Singer"><img src='https://img.shields.io/badge/Github-Page-green' alt="Github"></a>
28
+ <a href="https://arxiv.org/abs/2602.07803"><img src="https://img.shields.io/badge/arXiv-2602.07803-b31b1b" alt="arXiv"></a>
29
+ <a href="https://github.com/Soul-AILab/SoulX-Singer/blob/main/assets/technical-report.pdf"><img src='https://img.shields.io/badge/Report-Github?label=Technical&color=red' alt="technical report"></a>
30
+ <a href="https://github.com/Soul-AILab/SoulX-Singer"><img src="https://img.shields.io/badge/License-Apache%202.0-blue.svg" alt="Apache-2.0"></a>
31
+ </div>
32
+
33
+ **SoulX-Singer** is a high-fidelity, zero-shot singing voice synthesis model that enables users to generate realistic singing voices for unseen singers. It supports melody-conditioned (F0 contour) and score-conditioned (MIDI notes) control for precise pitch, rhythm, and expression.
34
+
35
+ For more details, please refer to the paper: [SoulX-Singer: Towards High-Quality Zero-Shot Singing Voice Synthesis](https://arxiv.org/abs/2602.07803).
36
+
37
+ ## Sample Usage
38
+
39
+ ### 1. Set Up Environment
40
+
41
+ ```bash
42
+ git clone https://github.com/Soul-AILab/SoulX-Singer.git
43
+ cd SoulX-Singer
44
+ conda create -n soulxsinger -y python=3.10
45
+ conda activate soulxsinger
46
+ pip install -r requirements.txt
47
+ ```
48
+
49
+ ### 2. Download Pretrained Models
50
+
51
+ ```bash
52
+ pip install -U huggingface_hub
53
+
54
+ # Download the SoulX-Singer SVS model
55
+ hf download Soul-AILab/SoulX-Singer --local-dir pretrained_models/SoulX-Singer
56
+
57
+ # Download models required for preprocessing
58
+ hf download Soul-AILab/SoulX-Singer-Preprocess --local-dir pretrained_models/SoulX-Singer-Preprocess
59
+ ```
60
+
61
+ ### 3. Run Inference
62
+
63
+ ```bash
64
+ bash example/infer.sh
65
+ ```
66
+
67
+ ## License
68
+
69
+ We use the Apache 2.0 license. Researchers and developers are free to use the codes and model weights of our SoulX-Singer. Check the license at [LICENSE](LICENSE) for more details.
70
+
71
+
72
+ ## Usage Disclaimer
73
+ This project provides a singing voice synthesis model for vocal generation capable of zero-shot voice cloning, intended for academic research, educational purposes, and legitimate applications, such as personalized vocal synthesis and assistive technologies.
74
+
75
+ Please note:
76
+ We advocate for the responsible development and use of AI and encourage the community to uphold safety and ethical principles in AI research and applications. If you have any concerns regarding ethics or misuse, please contact us.
77
+
78
+
79
+ ## Citation
80
+
81
+ ```bibtex
82
+ @misc{soulxsinger,
83
+ title={SoulX-Singer: Towards High-Quality Zero-Shot Singing Voice Synthesis},
84
+ author={Jiale Qian and Hao Meng and Tian Zheng and Pengcheng Zhu and Haopeng Lin and Yuhang Dai and Hanke Xie and Wenxiao Cao and Ruixuan Shang and Jun Wu and Hongmei Liu and Hanlin Wen and Jian Zhao and Zhonglin Jiang and Yong Chen and Shunshun Yin and Ming Tao and Jianguo Wei and Lei Xie and Xinsheng Wang},
85
+ year={2026},
86
+ eprint={2602.07803},
87
+ archivePrefix={arXiv},
88
+ primaryClass={eess.AS},
89
+ url={https://arxiv.org/abs/2602.07803},
90
+ }
91
+ ```
92
+
93
+ ## Contact Us
94
+ If you are interested in leaving a message to our work, feel free to email qianjiale@soulapp.cn or menghao@soulapp.cn or wangxinsheng@soulapp.cn
95
+
96
+ You’re welcome to join our WeChat or Soul APP group for technical discussions, updates.
97
+ <p align="center">
98
+
99
+
100
+ <br>
101
+ <span style="display: inline-block; margin-right: 10px;">
102
+ <img src="assets/soul_wechat01.jpg" width="500" alt="WeChat Group QR Code"/>
103
+ </span>
104
+ </p>
105
+
106
+
assets/soul_wechat01.jpg ADDED

Git LFS Details

  • SHA256: b452c23c33f4d0771f922aed4ceb92c0d6e893e74061f78b69a222f94bbd3c4a
  • Pointer size: 131 Bytes
  • Size of remote file: 835 kB
assets/soulx-logo.png ADDED

Git LFS Details

  • SHA256: 4fe6c191a71be0323d52b236d8ed57f346821ee66c4a9bd8b6232cbca9bf3daf
  • Pointer size: 131 Bytes
  • Size of remote file: 636 kB
config.yaml ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ infer:
2
+ n_steps: 32
3
+ cfg: 3
4
+
5
+ audio:
6
+ hop_size: 480
7
+ sample_rate: 24000
8
+ max_length: 36000
9
+ n_fft: 1920
10
+ num_mels: 128
11
+ win_size: 1920
12
+ fmin: 0
13
+ fmax: 12000
14
+ mel_var: 8.14
15
+ mel_mean: -4.92
16
+
17
+ model:
18
+ encoder:
19
+ vocab_size: 3000
20
+ text_dim: 512
21
+ pitch_dim: 512
22
+ type_dim: 512
23
+ f0_bin: 361
24
+ f0_dim: 512
25
+ num_layers: 4
26
+
27
+ flow_matching:
28
+ mel_dim: 128
29
+ hidden_size: 1024
30
+ num_layers: 22
31
+ num_heads: 16
32
+ cfg_drop_prob: 0.2
33
+ use_embedding: False
34
+ cond_codebook_size: 512
35
+ cond_scale_factor: 1
36
+ sigma: 1e-5
37
+ time_scheduler: cos
model.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:447eaf41f91a6b6659d55e9ec3c9b809221724fb8592aebaec35a23751a5b500
3
+ size 2818092278