Spaces:
Running
on
Zero
Running
on
Zero
File size: 5,304 Bytes
c7f3ffb |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 |
# 🎵 SoulX-Singer-Preprocess
This part offers a comprehensive **singing transcription and editing toolkit** for real-world music audio. It provides the pipeline from vocal extraction to high-level annotation optimized for SVS dataset construction. By integrating state-of-the-art models, it transforms raw audio into structured singing data and supports the **customizable creation and editing of lyric-aligned MIDI scores**.
## ✨ Features
The toolkit includes the following core modules:
- 🎤 **Clean Dry Vocal Extraction**
Extracts the lead vocal track from polyphonic music audio and dereverberation.
- 📝 **Lyrics Transcription**
Automatically transcribes lyrics from clean vocal.
- 🎶 **Note Transcription**
Converts singing voice into note-level representations for SVS.
- 🎼 **MIDI Editor**
Supports customizable creation and editing of MIDI scores integrated with lyrics.
## 🔧 Python Environment
Before running the pipeline, set up the Python environment as follows:
1. **Install Conda** (if not already installed): https://docs.conda.io/en/latest/miniconda.html
2. **Activate or create a conda environment** (recommended Python 3.10):
- If you already have the `soulxsinger` environment:
```bash
conda activate soulxsinger
```
- Otherwise, create it first:
```bash
conda create -n soulxsinger -y python=3.10
conda activate soulxsinger
```
3. **Install dependencies** from the `preprocess` directory:
```bash
cd preprocess
pip install -r requirements.txt
```
## 📁 Data Preparation
Before running the pipeline, prepare the following inputs:
- **Prompt audio**
Reference audio that provides timbre and style
- **Target audio**
Original vocal or music audio to be processed and transcribed.
Configure the corresponding parameters in:
```
example/preprocess.sh
```
Typical configuration includes:
- Input / output paths
- Module enable switches
## 🚀 Usage
After configuring `preprocess.sh`, run the transcription pipeline with:
```bash
bash example/preprocess.sh
```
The script will automatically execute the following steps:
1. **Vocal separation and dereverberation**
2. **F0 extraction and voice activity detection (VAD)**
3. **Lyrics transcription**
4. **Note transcription**
---
After the pipeline completes, you will obtain **SoulX-Singer–style metadata** that can be directly used for Singing Voice Synthesis (SVS).
**Output paths:**
- The final metadata (**JSON file**) is written **in the same directory as your input audio**, with the **same filename** (e.g. `audio.mp3` → `audio.json`)
- All **intermediate results** (separated vocal and accompaniment, F0, VAD outputs, etc.) are also saved under the configured **`save_dir`**.
⚠️ **Important Note**
Transcription errors—especially in **lyrics** and **note annotations**—can significantly affect the final SVS quality. We **strongly recommend manually reviewing and correcting** the generated metadata before inference.
To support this, we provide a **MIDI Editor** for editing lyrics, phoneme alignment, note pitches, and durations. The workflow is:
**Export metadata to MIDI** → edit in the MIDI Editor → **Import edited MIDI back to metadata** for SVS.
---
#### Step 1: Metadata → MIDI (for editing)
Convert SoulX-Singer metadata to a MIDI file so you can open it in the MIDI Editor:
```bash
preprocess_root=example/transcriptions/music
python -m preprocess.tools.midi_parser \
--meta2midi \
--meta "${preprocess_root}/metadata.json" \
--midi "${preprocess_root}/vocal.mid"
```
#### Step 2: Edit in the MIDI Editor
Open the MIDI Editor (see [MIDI Editor Tutorial](tools/midi_editor/README.md)), load `vocal.mid`, and correct lyrics, pitches, or durations as needed. Save the result as e.g. `vocal_edited.mid`.
#### Step 3: MIDI → Metadata (for SoulX-Singer inference)
Convert the edited MIDI back into SoulX-Singer-style metadata (and cut wavs) for SVS:
```bash
python -m preprocess.tools.midi_parser \
--midi2meta \
--midi "${preprocess_root}/vocal_edited.mid" \
--meta "${preprocess_root}/edit_metadata.json" \
--vocal "${preprocess_root}/vocal.wav" \
```
Use `edit_metadata.json` (and the wavs under `edit_cut_wavs`) as the target metadata in your inference pipeline.
## 🔗 References & Dependencies
This project builds upon the following excellent open-source works:
### 🎧 Vocal Separation & Dereverberation
- [Music Source Separation Training](https://github.com/ZFTurbo/Music-Source-Separation-Training)
- [Lead Vocal Separation](https://huggingface.co/becruily/mel-band-roformer-karaoke)
- [Vocal Dereverberation](https://huggingface.co/anvuew/dereverb_mel_band_roformer)
### 🎼 F0 Extraction
- [RMVPE](https://github.com/Dream-High/RMVPE)
### 📝 Lyrics Transcription (ASR)
- [Paraformer](https://modelscope.cn/models/iic/speech_seaco_paraformer_large_asr_nat-zh-cn-16k-common-vocab8404-pytorch)
- [Parakeet-tdt-0.6b-v2](https://huggingface.co/nvidia/parakeet-tdt-0.6b-v2)
### 🎶 Note Transcription
- [ROSVOT](https://github.com/RickyL-2000/ROSVOT)
We sincerely thank the authors of these repositories for their exceptional open-source contributions, which have been fundamental to the development of this toolkit.
|