Spaces:
Running
on
Zero
A newer version of the Gradio SDK is available:
6.5.1
title: SoulX-Singer
emoji: 🎤
sdk: gradio
sdk_version: 6.3.0
app_file: app.py
python_version: '3.10'
suggested_hardware: zero-a10g
🎤 SoulX-Singer
Official inference code for
SoulX-Singer: Towards High-Quality Zero-Shot Singing Voice Synthesis
🎵 Overview
SoulX-Singer is a high-fidelity, zero-shot singing voice synthesis model that enables users to generate realistic singing voices for unseen singers.
It supports melody-conditioned (F0 contour) and score-conditioned (MIDI notes) control for precise pitch, rhythm, and expression.
✨ Key Features
- 🎤 Zero-Shot Singing – Generate high-fidelity voices for unseen singers, no fine-tuning needed.
- 🎵 Flexible Control Modes – Melody (F0) and Score (MIDI) conditioning.
- 📚 Large-Scale Dataset – 42,000+ hours of aligned vocals, lyrics, notes across Mandarin, English, Cantonese.
- 🧑🎤 Timbre Cloning – Preserve singer identity across languages, styles, and edited lyrics.
- ✏️ Singing Voice Editing – Modify lyrics while keeping natural prosody.
- 🌐 Cross-Lingual Synthesis – High-fidelity synthesis by disentangling timbre from content.
🎬 Demo Examples
📰 News
- [2026-02-06] SoulX-Singer inference code and models released.
🚀 Quick Start
Note: This repo does not ship pretrained weights. SVS and preprocessing models must be downloaded from Hugging Face (see step 3).
1. Clone Repository
git clone https://github.com/Soul-AILab/SoulX-Singer.git
cd SoulX-Singer
2. Set Up Environment
1. Install Conda (if not already installed): https://docs.conda.io/en/latest/miniconda.html
2. Create and activate a Conda environment:
conda create -n soulxsinger -y python=3.10
conda activate soulxsinger
3. Install dependencies:
pip install -r requirements.txt
⚠️ If you are in mainland China, use a PyPI mirror:
pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/ --trusted-host=mirrors.aliyun.com
3. Download Pretrained Models
This repository does not include pretrained models. You must download them from Hugging Face:
- Soul-AILab/SoulX-Singer (SVS model)
- Soul-AILab/SoulX-Singer-Preprocess (preprocessing models)
Install Hugging Face Hub and download:
pip install -U huggingface_hub
# SoulX-Singer SVS model
huggingface-cli download Soul-AILab/SoulX-Singer --local-dir pretrained_models/SoulX-Singer
# Preprocessing models (vocal separation, F0, ASR, etc.)
huggingface-cli download Soul-AILab/SoulX-Singer-Preprocess --local-dir pretrained_models/SoulX-Singer-Preprocess
4. Run the Demo
Run the inference demo:
bash example/infer.sh
This script relies on metadata generated from the preprocessing pipeline, including vocal separation and transcription. Users should follow the steps in preprocess to prepare the necessary metadata before running the demo with their own data.
⚠️ Important Note The metadata produced by the automatic preprocessing pipeline may not perfectly align the singing audio with the corresponding lyrics and musical notes. For best synthesis quality, we strongly recommend manually correcting the alignment using the 🎼 Midi-Editor.
How to use the Midi-Editor:
🌐 WebUI
You can launch the interactive interface with:
python webui.py
🚀 Deploy as Hugging Face Space
This repo is ready to deploy as a Hugging Face Space. Pretrained models are not included; app.py downloads them from the Hub on first run.
📖 详细部署指南请查看:DEPLOY.md
快速步骤:
- 创建 Space:访问 huggingface.co/spaces,点击 "Create new Space",选择 Gradio SDK
- 上传代码:使用 Git 推送或 Web 界面上传代码文件
- 配置硬件:在 Space Settings 中选择 GPU T4 Small(推荐)以加快推理速度
- 等待启动:Space 会自动安装依赖、下载模型并启动应用(首次运行可能需要 5-15 分钟)
模型会自动从以下仓库下载:
- Soul-AILab/SoulX-Singer (SVS model)
- Soul-AILab/SoulX-Singer-Preprocess (preprocessing models)
🚧 Roadmap
- 🖥️ Web-based UI for easy and interactive inference
- 🌐 Online demo deployment on Hugging Face Spaces
- 📊 Release the SoulX-Singer-Eval benchmark
- 📚 Comprehensive tutorials and usage documentation
🙏 Acknowledgements
Special thanks to the following open-source projects:
- F5-TTS
- Amphion
- Music Source Separation Training
- Lead Vocal Separation
- Vocal Dereverberation
- RMVPE Paraformer
- Parakeet-tdt-0.6b-v2
- ROSVOT
📄 License
We use the Apache 2.0 license. Researchers and developers are free to use the codes and model weights of our SoulX-Singer. Check the license at LICENSE for more details.
⚠️ Usage Disclaimer
SoulX-Singer is intended for academic research, educational purposes, and legitimate applications such as personalized singing synthesis and assistive technologies.
Please note:
- 🎤 Respect intellectual property, privacy, and personal consent when generating singing content.
- 🚫 Do not use the model to impersonate individuals without authorization or to create deceptive audio.
- ⚠️ The developers assume no liability for any misuse of this model.
We advocate for the responsible development and use of AI and encourage the community to uphold safety and ethical principles. For ethics or misuse concerns, please contact us.
📬 Contact Us
We welcome your feedback, questions, and collaboration:
Email: qianjiale@soulapp.cn | menghao@soulapp.cn | wangxinsheng@soulapp.cn
Join discussions: WeChat or Soul APP groups for technical discussions and updates: