File size: 3,527 Bytes

e95b1f1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cc30725
 
d87f0fc
cc30725
d87f0fc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0d12ed4
d87f0fc
 
 
 
 
0d12ed4
d87f0fc
 
 
 
 
0d12ed4
d87f0fc
 
cc30725
e95b1f1
 
 
d87f0fc
e95b1f1
 
 
 
 
 
 
 
 
d87f0fc
e95b1f1
 
d87f0fc
 
 
 
e95b1f1
 
d87f0fc

---

language:
- en
- zh
library_name: huggingface_hub
license: apache-2.0
pipeline_tag: text-to-speech
tags:
- text-to-audio
- music
- singing-voice-synthesis
- svs
- zero-shot

---

## ComfyUI Custom Node

This repository includes a custom node for ComfyUI integration:

🔗 **[ComfyUI-SoulX-Singer](https://github.com/Saganaki22/ComfyUI-SoulX-Singer)**


![Screenshot 2026-02-11 160905](https://cdn-uploads.huggingface.co/production/uploads/63473b59e5c0717e6737b872/FqxVnkFDrVt287ppwQj90.png)

Use this custom node to integrate SoulX-Singer into your ComfyUI workflows for seamless singing voice synthesis.

# SoulX-Singer: Converted .pt model to .safetensors
**bf16 + fp32**

## Audio Samples

### Original Audio
<audio controls>
  <source src="https://huggingface.co/drbaph/SoulX-Singer/resolve/main/samples/song.mp3" type="audio/mpeg">
  Your browser does not support the audio element.
</audio>

### SpongeBob Voice
<audio controls>
  <source src="https://huggingface.co/drbaph/SoulX-Singer/resolve/main/samples/generated/sample-1.mp3" type="audio/mpeg">
  Your browser does not support the audio element.
</audio>

### Male Voice
<audio controls>
  <source src="https://huggingface.co/drbaph/SoulX-Singer/resolve/main/samples/generated/sample-2.mp3" type="audio/mpeg">
  Your browser does not support the audio element.
</audio>

---

<div align="center">
    <b><em>Towards High-Quality Zero-Shot Singing Voice Synthesis</em></b>
    <p>
    <img src="assets/soulx-logo.png" alt="SoulX-Singer_Logo" style="height: 80px;">
    </p>
    <p>
    <a href="https://soul-ailab.github.io/soulx-singer/"><img src="https://img.shields.io/badge/Demo-Page-lightgrey" alt="version"></a>
    <a href="https://github.com/Soul-AILab/SoulX-Singer"><img src='https://img.shields.io/badge/Github-Page-green' alt="Github"></a>
    <a href="https://arxiv.org/abs/2602.07803"><img src="https://img.shields.io/badge/arXiv-2602.07803-b31b1b" alt="arXiv"></a>
    <a href="https://github.com/Soul-AILab/SoulX-Singer/blob/main/assets/technical-report.pdf"><img src='https://img.shields.io/badge/Report-Github?label=Technical&color=red' alt="technical report"></a>
    <a href="https://github.com/Soul-AILab/SoulX-Singer"><img src="https://img.shields.io/badge/License-Apache%202.0-blue.svg" alt="Apache-2.0"></a>
    </p>
</div>

---

## Overview

**SoulX-Singer** is a high-fidelity, zero-shot singing voice synthesis model that enables users to generate realistic singing voices for unseen singers. It supports melody-conditioned (F0 contour) and score-conditioned (MIDI notes) control for precise pitch, rhythm, and expression.

For more details, please refer to the paper: [SoulX-Singer: Towards High-Quality Zero-Shot Singing Voice Synthesis](https://arxiv.org/abs/2602.07803).


---

## Features

- **Zero-shot synthesis**: Generate singing voices for unseen singers without fine-tuning
- **Melody-conditioned control**: Use F0 contour for pitch guidance
- **Score-conditioned control**: Use MIDI notes for precise musical notation
- **High-fidelity output**: Realistic vocal synthesis with natural expression
- **Safetensors format**: Optimized model weights in bf16 + fp32 precision

---

## Citation

If you use SoulX-Singer in your research, please cite:

```bibtex
@article{soulxsinger2025,
  title={SoulX-Singer: Towards High-Quality Zero-Shot Singing Voice Synthesis},
  author={Soul-AILab},
  journal={arXiv preprint arXiv:2602.07803},
  year={2025}
}
```

---

## License

This project is licensed under the Apache License 2.0.