| | --- |
| | |
| | language: |
| | - en |
| | - zh |
| | library_name: huggingface_hub |
| | license: apache-2.0 |
| | pipeline_tag: text-to-speech |
| | tags: |
| | - text-to-audio |
| | - music |
| | - singing-voice-synthesis |
| | - svs |
| | - zero-shot |
| |
|
| | --- |
| | |
| | ## ComfyUI Custom Node |
| |
|
| | This repository includes a custom node for ComfyUI integration: |
| |
|
| | 🔗 **[ComfyUI-SoulX-Singer](https://github.com/Saganaki22/ComfyUI-SoulX-Singer)** |
| |
|
| |
|
| |  |
| |
|
| | Use this custom node to integrate SoulX-Singer into your ComfyUI workflows for seamless singing voice synthesis. |
| |
|
| | # SoulX-Singer: Converted .pt model to .safetensors |
| | **bf16 + fp32** |
| |
|
| | ## Audio Samples |
| |
|
| | ### Original Audio |
| | <audio controls> |
| | <source src="https://huggingface.co/drbaph/SoulX-Singer/resolve/main/samples/song.mp3" type="audio/mpeg"> |
| | Your browser does not support the audio element. |
| | </audio> |
| |
|
| | ### SpongeBob Voice |
| | <audio controls> |
| | <source src="https://huggingface.co/drbaph/SoulX-Singer/resolve/main/samples/generated/sample-1.mp3" type="audio/mpeg"> |
| | Your browser does not support the audio element. |
| | </audio> |
| |
|
| | ### Male Voice |
| | <audio controls> |
| | <source src="https://huggingface.co/drbaph/SoulX-Singer/resolve/main/samples/generated/sample-2.mp3" type="audio/mpeg"> |
| | Your browser does not support the audio element. |
| | </audio> |
| |
|
| | --- |
| |
|
| | <div align="center"> |
| | <b><em>Towards High-Quality Zero-Shot Singing Voice Synthesis</em></b> |
| | <p> |
| | <img src="assets/soulx-logo.png" alt="SoulX-Singer_Logo" style="height: 80px;"> |
| | </p> |
| | <p> |
| | <a href="https://soul-ailab.github.io/soulx-singer/"><img src="https://img.shields.io/badge/Demo-Page-lightgrey" alt="version"></a> |
| | <a href="https://github.com/Soul-AILab/SoulX-Singer"><img src='https://img.shields.io/badge/Github-Page-green' alt="Github"></a> |
| | <a href="https://arxiv.org/abs/2602.07803"><img src="https://img.shields.io/badge/arXiv-2602.07803-b31b1b" alt="arXiv"></a> |
| | <a href="https://github.com/Soul-AILab/SoulX-Singer/blob/main/assets/technical-report.pdf"><img src='https://img.shields.io/badge/Report-Github?label=Technical&color=red' alt="technical report"></a> |
| | <a href="https://github.com/Soul-AILab/SoulX-Singer"><img src="https://img.shields.io/badge/License-Apache%202.0-blue.svg" alt="Apache-2.0"></a> |
| | </p> |
| | </div> |
| | |
| | --- |
| |
|
| | ## Overview |
| |
|
| | **SoulX-Singer** is a high-fidelity, zero-shot singing voice synthesis model that enables users to generate realistic singing voices for unseen singers. It supports melody-conditioned (F0 contour) and score-conditioned (MIDI notes) control for precise pitch, rhythm, and expression. |
| |
|
| | For more details, please refer to the paper: [SoulX-Singer: Towards High-Quality Zero-Shot Singing Voice Synthesis](https://arxiv.org/abs/2602.07803). |
| |
|
| |
|
| | --- |
| |
|
| | ## Features |
| |
|
| | - **Zero-shot synthesis**: Generate singing voices for unseen singers without fine-tuning |
| | - **Melody-conditioned control**: Use F0 contour for pitch guidance |
| | - **Score-conditioned control**: Use MIDI notes for precise musical notation |
| | - **High-fidelity output**: Realistic vocal synthesis with natural expression |
| | - **Safetensors format**: Optimized model weights in bf16 + fp32 precision |
| |
|
| | --- |
| |
|
| | ## Citation |
| |
|
| | If you use SoulX-Singer in your research, please cite: |
| |
|
| | ```bibtex |
| | @article{soulxsinger2025, |
| | title={SoulX-Singer: Towards High-Quality Zero-Shot Singing Voice Synthesis}, |
| | author={Soul-AILab}, |
| | journal={arXiv preprint arXiv:2602.07803}, |
| | year={2025} |
| | } |
| | ``` |
| |
|
| | --- |
| |
|
| | ## License |
| |
|
| | This project is licensed under the Apache License 2.0. |