Aloukik21 commited on
Commit
37a0b87
·
verified ·
1 Parent(s): 8fbfd2d

Add SoulX-Podcast: TTS/SoulX-Podcast-1.7B/README.md

Browse files
Files changed (1) hide show
  1. TTS/SoulX-Podcast-1.7B/README.md +185 -0
TTS/SoulX-Podcast-1.7B/README.md ADDED
@@ -0,0 +1,185 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ - zh
6
+ tags:
7
+ - text-to-speech
8
+ ---
9
+
10
+ <div align="center">
11
+ <h1>
12
+ SoulX-Podcast
13
+ </h1>
14
+ <p>
15
+ Official inference code for <br>
16
+ <b><em>SoulX-Podcast: Towards Realistic Long-form Podcasts with Dialectal and Paralinguistic Diversity</em></b>
17
+ </p>
18
+ <p>
19
+ <!-- <img src="assets/XiaoHongShu_Logo.png" alt="Institution 4" style="width: 102px; height: 48px;"> -->
20
+ <img src="assets/SoulX-Podcast-log.jpg" alt="SoulX-Podcast_Logo" style="width: 200px; height: 68px;">
21
+ </p>
22
+ <p>
23
+ </p>
24
+ <a href="https://soul-ailab.github.io/soulx-podcast/"><img src="https://img.shields.io/badge/Demo-Page-lightgrey" alt="version"></a>
25
+ <a href="https://github.com/Soul-AILab/SoulX-Podcast"><img src='https://img.shields.io/badge/Github-Page-green' alt="Github"></a>
26
+ <a href="https://arxiv.org/pdf/2510.23541"><img src='https://img.shields.io/badge/Report-Github?label=Technical&color=red' alt="technical report"></a>
27
+ <a href="https://github.com/Soul-AILab/SoulX-Podcast"><img src="https://img.shields.io/badge/License-Apache%202.0-blue.svg" alt="Apache-2.0"></a>
28
+ </div>
29
+
30
+
31
+ <p align="center">
32
+ <h1>SoulX-Podcast: Towards Realistic Long-form Podcasts with Dialectal and Paralinguistic Diversity</h1>
33
+ <p>
34
+
35
+ ## Overview
36
+ SoulX-Podcast is designed for podcast-style multi-turn, multi-speaker dialogic speech generation, while also achieving superior performance in the conventional monologue TTS task.
37
+
38
+ To meet the higher naturalness demands of multi-turn spoken dialogue, SoulX-Podcast integrates a range of paralinguistic controls and supports both Mandarin and English, as well as several Chinese dialects, including Sichuanese, Henanese, and Cantonese, enabling more personalized podcast-style speech generation.
39
+
40
+
41
+ ## Key Features 🔥
42
+
43
+ - **Long-form, multi-turn, multi-speaker dialogic speech generation**: SoulX-Podcast excels in generating high-quality, natural-sounding dialogic speech for multi-turn, multi-speaker scenarios.
44
+
45
+ - **Cross-dialectal, zero-shot voice cloning**: SoulX-Podcast supports zero-shot voice cloning across different Chinese dialects, enabling the generation of high-quality, personalized speech in any of the supported dialects.
46
+
47
+ - **Paralinguistic controls**: SoulX-Podcast supports a variety of paralinguistic events, as as ***laugher*** and ***sighs*** to enhance the realism of synthesized results.
48
+
49
+ <table align="center">
50
+ <tr>
51
+ <td align="center"><br><img src="assets/performance_radar.png" width="80%" /></td>
52
+ </tr>
53
+ </table>
54
+
55
+ ## Install
56
+
57
+ ### Clone and Install
58
+ Here are instructions for installing on Linux.
59
+ - Clone the repo
60
+ ```
61
+ git clone git@github.com:Soul-AILab/SoulX-Podcast.git
62
+ cd SoulX-Podcast
63
+ ```
64
+ - Install Conda: please see https://docs.conda.io/en/latest/miniconda.html
65
+ - Create Conda env:
66
+ ```
67
+ conda create -n soulxpodcast -y python=3.11
68
+ conda activate soulxpodcast
69
+ pip install -r requirements.txt
70
+ # If you are in mainland China, you can set the mirror as follows:
71
+ pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/ --trusted-host=mirrors.aliyun.com
72
+ ```
73
+
74
+ ### Model Download
75
+
76
+ ```sh
77
+ pip install -U huggingface_hub
78
+
79
+ # base model
80
+ huggingface-cli download --resume-download Soul-AILab/SoulX-Podcast-1.7B --local-dir pretrained_models/SoulX-Podcast-1.7B
81
+
82
+ # dialectal model
83
+ huggingface-cli download --resume-download Soul-AILab/SoulX-Podcast-1.7B-dialect --local-dir pretrained_models/SoulX-Podcast-1.7B-dialect
84
+ ```
85
+
86
+
87
+ Download via python:
88
+ ```python
89
+ from huggingface_hub import snapshot_download
90
+
91
+ # base model
92
+ snapshot_download("Soul-AILab/SoulX-Podcast-1.7B", local_dir="pretrained_models/SoulX-Podcast-1.7B")
93
+
94
+ # dialectal model
95
+ snapshot_download("Soul-AILab/SoulX-Podcast-1.7B-dialect", local_dir="pretrained_models/SoulX-Podcast-1.7B-dialect")
96
+
97
+ ```
98
+
99
+ Download via git clone:
100
+ ```sh
101
+ mkdir -p pretrained_models
102
+
103
+ # Make sure you have git-lfs installed (https://git-lfs.com)
104
+ git lfs install
105
+
106
+ # base model
107
+ git clone https://huggingface.co/Soul-AILab/SoulX-Podcast-1.7B pretrained_models/SoulX-Podcast-1.7B
108
+
109
+ # dialectal model
110
+ git clone https://huggingface.co/Soul-AILab/SoulX-Podcast-1.7B-dialect pretrained_models/SoulX-Podcast-1.7B-dialect
111
+ ```
112
+
113
+
114
+ ### Basic Usage
115
+
116
+ You can simply run the demo with the following commands:
117
+ ``` sh
118
+ # dialectal inference
119
+ bash example/infer_dialogue.sh
120
+ ```
121
+
122
+ ## TODOs
123
+ - [x] Add example scripts for monologue TTS.
124
+ - [x] Publish the [technical report](https://arxiv.org/pdf/2510.23541).
125
+ - [x] Develop a WebUI for easy inference.
126
+ - [x] Deploy an online demo on Hugging Face Spaces.
127
+ - [x] Dockerize the project with vLLM support.
128
+ - [ ] Add support for streaming inference.
129
+
130
+ ## Citation
131
+
132
+ ```bibtex
133
+ @misc{SoulXPodcast,
134
+ title = {SoulX-Podcast: Towards Realistic Long-form Podcasts with Dialectal and Paralinguistic Diversity},
135
+ author = {Hanke Xie and Haopeng Lin and Wenxiao Cao and Dake Guo and Wenjie Tian and Jun Wu and Hanlin Wen and Ruixuan Shang and Hongmei Liu and Zhiqi Jiang and Yuepeng Jiang and Wenxi Chen and Ruiqi Yan and Jiale Qian and Yichao Yan and Shunshun Yin and Ming Tao and Xie Chen and Lei Xie and Xinsheng Wang},
136
+ year = {2025},
137
+ archivePrefix={arXiv},
138
+ url = {https://arxiv.org/abs/2510.23541}
139
+ }
140
+
141
+ ```
142
+
143
+ ## License
144
+
145
+ We use the Apache 2.0 license. Researchers and developers are free to use the codes and model weights of our SoulX-Podcast. Check the license at [LICENSE](LICENSE) for more details.
146
+
147
+
148
+ ## Acknowledge
149
+ - This repo benefits from [FlashCosyVoice](https://github.com/xingchensong/FlashCosyVoice/tree/main)
150
+
151
+
152
+ ## Usage Disclaimer
153
+ This project provides a speech synthesis model for podcast generation capable of zero-shot voice cloning, intended for academic research, educational purposes, and legitimate applications, such as personalized speech synthesis, assistive technologies, and linguistic research.
154
+
155
+ Please note:
156
+
157
+ Do not use this model for unauthorized voice cloning, impersonation, fraud, scams, deepfakes, or any illegal activities.
158
+
159
+ Ensure compliance with local laws and regulations when using this model and uphold ethical standards.
160
+
161
+ The developers assume no liability for any misuse of this model.
162
+
163
+ We advocate for the responsible development and use of AI and encourage the community to uphold safety and ethical principles in AI research and applications. If you have any concerns regarding ethics or misuse, please contact us.
164
+
165
+
166
+ ## Contact Us
167
+ If you are interested in leaving a message to our work, feel free to email hkxie@mail.nwpu.edu.cn or linhaopeng@soulapp.cn or lxie@nwpu.edu.cn or wangxinsheng@soulapp.cn
168
+
169
+ You’re welcome to join our WeChat group for technical discussions, updates.
170
+ <p align="center">
171
+ <!-- <em>Due to group limits, if you can't scan the QR code, please add my WeChat for group access -->
172
+ <!-- : <strong>Tiamo James</strong></em> -->
173
+ <br>
174
+ <span style="display: inline-block; margin-right: 10px;">
175
+ <img src="assets/wechat2.jpg" width="300" alt="WeChat Group QR Code"/>
176
+ </span>
177
+ <!-- <span style="display: inline-block;">
178
+ <img src="assets/wechat_tiamo.jpg" width="300" alt="WeChat QR Code"/>
179
+ </span> -->
180
+ </p>
181
+
182
+ <!-- <p align="center">
183
+ <img src="src/figs/npu@aslp.jpeg" width="500"/>
184
+ </p -->
185
+ <!-- <img src="assets/wechat.jpg -->