Spaces:
Running
on
Zero
A newer version of the Gradio SDK is available:
6.5.1
π΅ SoulX-Singer-Preprocess
This part offers a comprehensive singing transcription and editing toolkit for real-world music audio. It provides the pipeline from vocal extraction to high-level annotation optimized for SVS dataset construction. By integrating state-of-the-art models, it transforms raw audio into structured singing data and supports the customizable creation and editing of lyric-aligned MIDI scores.
β¨ Features
The toolkit includes the following core modules:
π€ Clean Dry Vocal Extraction
Extracts the lead vocal track from polyphonic music audio and dereverberation.π Lyrics Transcription
Automatically transcribes lyrics from clean vocal.πΆ Note Transcription
Converts singing voice into note-level representations for SVS.πΌ MIDI Editor
Supports customizable creation and editing of MIDI scores integrated with lyrics.
π§ Python Environment
Before running the pipeline, set up the Python environment as follows:
Install Conda (if not already installed): https://docs.conda.io/en/latest/miniconda.html
Activate or create a conda environment (recommended Python 3.10):
If you already have the
soulxsingerenvironment:conda activate soulxsingerOtherwise, create it first:
conda create -n soulxsinger -y python=3.10 conda activate soulxsinger
Install dependencies from the
preprocessdirectory:cd preprocess pip install -r requirements.txt
π Data Preparation
Before running the pipeline, prepare the following inputs:
Prompt audio
Reference audio that provides timbre and styleTarget audio
Original vocal or music audio to be processed and transcribed.
Configure the corresponding parameters in:
example/preprocess.sh
Typical configuration includes:
- Input / output paths
- Module enable switches
π Usage
After configuring preprocess.sh, run the transcription pipeline with:
bash example/preprocess.sh
The script will automatically execute the following steps:
- Vocal separation and dereverberation
- F0 extraction and voice activity detection (VAD)
- Lyrics transcription
- Note transcription
After the pipeline completes, you will obtain SoulX-Singerβstyle metadata that can be directly used for Singing Voice Synthesis (SVS).
Output paths:
- The final metadata (JSON file) is written in the same directory as your input audio, with the same filename (e.g.
audio.mp3βaudio.json) - All intermediate results (separated vocal and accompaniment, F0, VAD outputs, etc.) are also saved under the configured
save_dir.
β οΈ Important Note
Transcription errorsβespecially in lyrics and note annotationsβcan significantly affect the final SVS quality. We strongly recommend manually reviewing and correcting the generated metadata before inference.
To support this, we provide a MIDI Editor for editing lyrics, phoneme alignment, note pitches, and durations. The workflow is:
Export metadata to MIDI β edit in the MIDI Editor β Import edited MIDI back to metadata for SVS.
Step 1: Metadata β MIDI (for editing)
Convert SoulX-Singer metadata to a MIDI file so you can open it in the MIDI Editor:
preprocess_root=example/transcriptions/music
python -m preprocess.tools.midi_parser \
--meta2midi \
--meta "${preprocess_root}/metadata.json" \
--midi "${preprocess_root}/vocal.mid"
Step 2: Edit in the MIDI Editor
Open the MIDI Editor (see MIDI Editor Tutorial), load vocal.mid, and correct lyrics, pitches, or durations as needed. Save the result as e.g. vocal_edited.mid.
Step 3: MIDI β Metadata (for SoulX-Singer inference)
Convert the edited MIDI back into SoulX-Singer-style metadata (and cut wavs) for SVS:
python -m preprocess.tools.midi_parser \
--midi2meta \
--midi "${preprocess_root}/vocal_edited.mid" \
--meta "${preprocess_root}/edit_metadata.json" \
--vocal "${preprocess_root}/vocal.wav" \
Use edit_metadata.json (and the wavs under edit_cut_wavs) as the target metadata in your inference pipeline.
π References & Dependencies
This project builds upon the following excellent open-source works:
π§ Vocal Separation & Dereverberation
πΌ F0 Extraction
π Lyrics Transcription (ASR)
πΆ Note Transcription
We sincerely thank the authors of these repositories for their exceptional open-source contributions, which have been fundamental to the development of this toolkit.