Text-to-Audio
MLX
Safetensors
apple-silicon
singing-voice-synthesis
singing-voice-conversion
soulx-singer
Instructions to use mlx-community/SoulX-Singer-bf16 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use mlx-community/SoulX-Singer-bf16 with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir SoulX-Singer-bf16 mlx-community/SoulX-Singer-bf16
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
Add README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,143 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
library_name: mlx
|
| 4 |
+
pipeline_tag: text-to-audio
|
| 5 |
+
base_model:
|
| 6 |
+
- Soul-AILab/SoulX-Singer
|
| 7 |
+
tags:
|
| 8 |
+
- mlx
|
| 9 |
+
- apple-silicon
|
| 10 |
+
- singing-voice-synthesis
|
| 11 |
+
- singing-voice-conversion
|
| 12 |
+
- soulx-singer
|
| 13 |
+
---
|
| 14 |
+
|
| 15 |
+
# SoulX-Singer-bf16
|
| 16 |
+
|
| 17 |
+
Apple MLX safetensors checkpoint for [`Soul-AILab/SoulX-Singer`](https://huggingface.co/Soul-AILab/SoulX-Singer), including both SoulX-Singer SVS and SoulX-Singer-SVC weights.
|
| 18 |
+
|
| 19 |
+
Collection: [https://huggingface.co/collections/mlx-community/soulx-singer-mlx-6a1c0525d0911ea400102840](https://huggingface.co/collections/mlx-community/soulx-singer-mlx-6a1c0525d0911ea400102840)
|
| 20 |
+
|
| 21 |
+
This is not a pure end-to-end MLX audio runtime yet. The weights are converted to an MLX-friendly safetensors layout, while full audio generation currently uses the official PyTorch model structure through [`ailuntx/SoulX-Singer-MLX`](https://github.com/ailuntx/SoulX-Singer-MLX).
|
| 22 |
+
|
| 23 |
+
## TL;DR
|
| 24 |
+
|
| 25 |
+
| | |
|
| 26 |
+
|---|---|
|
| 27 |
+
| **Precision** | `bf16` |
|
| 28 |
+
| **Disk** | 2.6G |
|
| 29 |
+
| **Components** | `svs/` and `svc/` |
|
| 30 |
+
| **Runtime / bridge** | [`ailuntx/SoulX-Singer-MLX`](https://github.com/ailuntx/SoulX-Singer-MLX) |
|
| 31 |
+
| **Official model** | [`Soul-AILab/SoulX-Singer`](https://huggingface.co/Soul-AILab/SoulX-Singer) |
|
| 32 |
+
| **Official code** | [`Soul-AILab/SoulX-Singer`](https://github.com/Soul-AILab/SoulX-Singer) |
|
| 33 |
+
|
| 34 |
+
## Quick Start
|
| 35 |
+
|
| 36 |
+
```bash
|
| 37 |
+
git clone https://github.com/ailuntx/SoulX-Singer-MLX.git
|
| 38 |
+
cd SoulX-Singer-MLX
|
| 39 |
+
python -m venv .venv
|
| 40 |
+
.venv/bin/pip install -U pip
|
| 41 |
+
.venv/bin/pip install -r requirements.txt mlx safetensors huggingface_hub hf_transfer
|
| 42 |
+
|
| 43 |
+
HF_HUB_ENABLE_HF_TRANSFER=1 .venv/bin/hf download mlx-community/SoulX-Singer-bf16 --local-dir ./models/SoulX-Singer-bf16
|
| 44 |
+
```
|
| 45 |
+
|
| 46 |
+
Run a short SVS bridge test:
|
| 47 |
+
|
| 48 |
+
```bash
|
| 49 |
+
PYTORCH_ENABLE_MPS_FALLBACK=1 \
|
| 50 |
+
SOULX_WHISPER_MODEL=pretrained_models/openai__whisper-base \
|
| 51 |
+
.venv/bin/python scripts/inference_mlx_bridge.py \
|
| 52 |
+
--model ./models/SoulX-Singer-bf16 \
|
| 53 |
+
--component svs \
|
| 54 |
+
--device mps \
|
| 55 |
+
--prompt_wav_path example/audio/zh_prompt.mp3 \
|
| 56 |
+
--prompt_metadata_path example/audio/zh_prompt.json \
|
| 57 |
+
--target_metadata_path example/audio/zh_target.json \
|
| 58 |
+
--control melody \
|
| 59 |
+
--n_steps 1 \
|
| 60 |
+
--cfg 1 \
|
| 61 |
+
--save_dir outputs_mlx_bridge/svs
|
| 62 |
+
```
|
| 63 |
+
|
| 64 |
+
Run an SVC bridge test:
|
| 65 |
+
|
| 66 |
+
```bash
|
| 67 |
+
PYTORCH_ENABLE_MPS_FALLBACK=1 \
|
| 68 |
+
SOULX_WHISPER_MODEL=pretrained_models/openai__whisper-base \
|
| 69 |
+
.venv/bin/python scripts/inference_mlx_bridge.py \
|
| 70 |
+
--model ./models/SoulX-Singer-bf16 \
|
| 71 |
+
--component svc \
|
| 72 |
+
--device mps \
|
| 73 |
+
--prompt_wav_path example/audio/zh_prompt.mp3 \
|
| 74 |
+
--target_wav_path example/audio/music.mp3 \
|
| 75 |
+
--prompt_f0_path example/audio/zh_prompt_f0.npy \
|
| 76 |
+
--target_f0_path example/audio/music_f0.npy \
|
| 77 |
+
--n_steps 1 \
|
| 78 |
+
--cfg 1 \
|
| 79 |
+
--save_dir outputs_mlx_bridge/svc
|
| 80 |
+
```
|
| 81 |
+
|
| 82 |
+
## Variants
|
| 83 |
+
|
| 84 |
+
| Variant | Disk | Notes |
|
| 85 |
+
|---|---:|---|
|
| 86 |
+
| [`SoulX-Singer-4bit`](https://huggingface.co/mlx-community/SoulX-Singer-4bit) | 774M | smallest checkpoint; MLX affine quantized |
|
| 87 |
+
| [`SoulX-Singer-8bit`](https://huggingface.co/mlx-community/SoulX-Singer-8bit) | 1.4G | smaller checkpoint; MLX affine quantized |
|
| 88 |
+
| [`SoulX-Singer-bf16`](https://huggingface.co/mlx-community/SoulX-Singer-bf16) | 2.6G | recommended high-quality baseline |
|
| 89 |
+
| [`SoulX-Singer-fp32`](https://huggingface.co/mlx-community/SoulX-Singer-fp32) | 5.6G | full-precision conversion baseline |
|
| 90 |
+
|
| 91 |
+
## Layout
|
| 92 |
+
|
| 93 |
+
```text
|
| 94 |
+
SoulX-Singer-bf16/
|
| 95 |
+
|-- config.json
|
| 96 |
+
|-- config.yaml
|
| 97 |
+
|-- mlx_manifest.json
|
| 98 |
+
|-- svs/
|
| 99 |
+
| |-- model.safetensors.index.json
|
| 100 |
+
| `-- model-00001-of-000xx.safetensors
|
| 101 |
+
`-- svc/
|
| 102 |
+
|-- model.safetensors.index.json
|
| 103 |
+
`-- model-00001-of-000xx.safetensors
|
| 104 |
+
```
|
| 105 |
+
|
| 106 |
+
This repo has 11 SVS shard(s) and 11 SVC shard(s).
|
| 107 |
+
|
| 108 |
+
## Validation
|
| 109 |
+
|
| 110 |
+
Local Apple Silicon validation:
|
| 111 |
+
|
| 112 |
+
| Test | Result |
|
| 113 |
+
|---|---|
|
| 114 |
+
| Official PyTorch/MPS SVS, `n_steps=1` | generated 6.71s WAV, 24kHz mono |
|
| 115 |
+
| `SoulX-Singer-bf16` component load | `svs/` and `svc/` safetensors are indexed and loadable |
|
| 116 |
+
| bf16 PyTorch bridge SVS, `n_steps=1` | generated 6.71s WAV, 24kHz mono, RMS about `0.029` |
|
| 117 |
+
|
| 118 |
+
For quantized checkpoints, the bridge loader dequantizes MLX affine tensors into the official PyTorch module shapes for compatibility testing. Native all-MLX inference is planned as a later runtime step.
|
| 119 |
+
|
| 120 |
+
## License
|
| 121 |
+
|
| 122 |
+
The converted weights follow the upstream SoulX-Singer Apache-2.0 release.
|
| 123 |
+
|
| 124 |
+
## Citation
|
| 125 |
+
|
| 126 |
+
```bibtex
|
| 127 |
+
@misc{soulx-singer-mlx,
|
| 128 |
+
title = {SoulX-Singer-MLX: Apple MLX safetensors port of SoulX-Singer},
|
| 129 |
+
author = {ailuntx},
|
| 130 |
+
year = {2026},
|
| 131 |
+
url = {https://github.com/ailuntx/SoulX-Singer-MLX},
|
| 132 |
+
}
|
| 133 |
+
|
| 134 |
+
@misc{soulxsinger,
|
| 135 |
+
title={SoulX-Singer: Towards High-Quality Zero-Shot Singing Voice Synthesis},
|
| 136 |
+
author={Jiale Qian and Hao Meng and Tian Zheng and Pengcheng Zhu and Haopeng Lin and Yuhang Dai and Hanke Xie and Wenxiao Cao and Ruixuan Shang and Jun Wu and Hongmei Liu and Hanlin Wen and Jian Zhao and Zhonglin Jiang and Yong Chen and Shunshun Yin and Ming Tao and Jianguo Wei and Lei Xie and Xinsheng Wang},
|
| 137 |
+
year={2026},
|
| 138 |
+
eprint={2602.07803},
|
| 139 |
+
archivePrefix={arXiv},
|
| 140 |
+
primaryClass={eess.AS},
|
| 141 |
+
url={https://arxiv.org/abs/2602.07803},
|
| 142 |
+
}
|
| 143 |
+
```
|