drbaph
/

SoulX-Singer

huggingface_hub

singing-voice-synthesis

Model card Files Files and versions

SoulX-Singer / README.md

drbaph's picture

Update README.md

0d12ed4 verified 15 days ago

|

history blame contribute delete

3.53 kB

	---

	language:
	- en
	- zh
	library_name: huggingface_hub
	license: apache-2.0
	pipeline_tag: text-to-speech
	tags:
	- text-to-audio
	- music
	- singing-voice-synthesis
	- svs
	- zero-shot

	---

	## ComfyUI Custom Node

	This repository includes a custom node for ComfyUI integration:

	🔗 [ComfyUI-SoulX-Singer](https://github.com/Saganaki22/ComfyUI-SoulX-Singer)


	![Screenshot 2026-02-11 160905](https://cdn-uploads.huggingface.co/production/uploads/63473b59e5c0717e6737b872/FqxVnkFDrVt287ppwQj90.png)

	Use this custom node to integrate SoulX-Singer into your ComfyUI workflows for seamless singing voice synthesis.

	# SoulX-Singer: Converted .pt model to .safetensors
	bf16 + fp32

	## Audio Samples

	### Original Audio
	<audio controls>
	<source src="https://huggingface.co/drbaph/SoulX-Singer/resolve/main/samples/song.mp3" type="audio/mpeg">
	Your browser does not support the audio element.
	</audio>

	### SpongeBob Voice
	<audio controls>
	<source src="https://huggingface.co/drbaph/SoulX-Singer/resolve/main/samples/generated/sample-1.mp3" type="audio/mpeg">
	Your browser does not support the audio element.
	</audio>

	### Male Voice
	<audio controls>
	<source src="https://huggingface.co/drbaph/SoulX-Singer/resolve/main/samples/generated/sample-2.mp3" type="audio/mpeg">
	Your browser does not support the audio element.
	</audio>

	---

	<div align="center">
	<b><em>Towards High-Quality Zero-Shot Singing Voice Synthesis</em></b>
	<p>
	<img src="assets/soulx-logo.png" alt="SoulX-Singer_Logo" style="height: 80px;">
	</p>
	<p>
	<a href="https://soul-ailab.github.io/soulx-singer/"><img src="https://img.shields.io/badge/Demo-Page-lightgrey" alt="version"></a>
	<a href="https://github.com/Soul-AILab/SoulX-Singer"><img src='https://img.shields.io/badge/Github-Page-green' alt="Github"></a>
	<a href="https://arxiv.org/abs/2602.07803"><img src="https://img.shields.io/badge/arXiv-2602.07803-b31b1b" alt="arXiv"></a>
	<a href="https://github.com/Soul-AILab/SoulX-Singer/blob/main/assets/technical-report.pdf"><img src='https://img.shields.io/badge/Report-Github?label=Technical&color=red' alt="technical report"></a>
	<a href="https://github.com/Soul-AILab/SoulX-Singer"><img src="https://img.shields.io/badge/License-Apache%202.0-blue.svg" alt="Apache-2.0"></a>
	</p>
	</div>

	---

	## Overview

	SoulX-Singer is a high-fidelity, zero-shot singing voice synthesis model that enables users to generate realistic singing voices for unseen singers. It supports melody-conditioned (F0 contour) and score-conditioned (MIDI notes) control for precise pitch, rhythm, and expression.

	For more details, please refer to the paper: [SoulX-Singer: Towards High-Quality Zero-Shot Singing Voice Synthesis](https://arxiv.org/abs/2602.07803).


	---

	## Features

	- Zero-shot synthesis: Generate singing voices for unseen singers without fine-tuning
	- Melody-conditioned control: Use F0 contour for pitch guidance
	- Score-conditioned control: Use MIDI notes for precise musical notation
	- High-fidelity output: Realistic vocal synthesis with natural expression
	- Safetensors format: Optimized model weights in bf16 + fp32 precision

	---

	## Citation

	If you use SoulX-Singer in your research, please cite:

	```bibtex
	@article{soulxsinger2025,
	title={SoulX-Singer: Towards High-Quality Zero-Shot Singing Voice Synthesis},
	author={Soul-AILab},
	journal={arXiv preprint arXiv:2602.07803},
	year={2025}
	}
	```

	---

	## License

	This project is licensed under the Apache License 2.0.