Upload README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,76 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
tags:
|
| 4 |
+
- music-source-separation
|
| 5 |
+
- stem-separation
|
| 6 |
+
- onnx
|
| 7 |
+
- audio
|
| 8 |
+
- hot-step
|
| 9 |
+
language:
|
| 10 |
+
- en
|
| 11 |
+
---
|
| 12 |
+
|
| 13 |
+
# HOT-Step CPP SuperSep β ONNX Stem Separation Models
|
| 14 |
+
|
| 15 |
+
Pre-converted ONNX models for multi-stem audio separation in [HOT-Step CPP](https://github.com/scragnog/HOT-Step-CPP). These run natively via ONNX Runtime GPU β no Python required.
|
| 16 |
+
|
| 17 |
+
## Models
|
| 18 |
+
|
| 19 |
+
| File | Architecture | Size | Purpose |
|
| 20 |
+
|------|-------------|------|---------|
|
| 21 |
+
| `bs_roformer_sw.onnx` | BS-Roformer | 672 MB | **Stage 1**: Primary 6-stem split (vocals, drums, bass, guitar, piano, other) |
|
| 22 |
+
| `mel_band_roformer_karaoke.onnx` | Mel-Band RoFormer | 875 MB | **Stage 2**: Vocal sub-separation (lead vs backing) |
|
| 23 |
+
| `mdx23c_drumsep.onnx` | MDX23C | 418 MB | **Stage 3**: Drum sub-separation (kick, snare, toms, hi-hat, cymbals) |
|
| 24 |
+
| `htdemucs_6s.onnx` | HTDemucs | 105 MB | **Stage 4**: "Other" stem refinement |
|
| 25 |
+
|
| 26 |
+
**Total: ~2.07 GB**
|
| 27 |
+
|
| 28 |
+
## Usage
|
| 29 |
+
|
| 30 |
+
These models are designed for use with the HOT-Step CPP Model Manager. In the app:
|
| 31 |
+
|
| 32 |
+
1. Open the **Model Manager** (click "Get More Models" in the Models dropdown)
|
| 33 |
+
2. Go to the **Stem Separation** tab
|
| 34 |
+
3. Click **Download** on each model (or use the Stem Separation starter pack)
|
| 35 |
+
|
| 36 |
+
Models are downloaded to `models/supersep/` and loaded automatically by the SuperSep engine.
|
| 37 |
+
|
| 38 |
+
### Technical Details
|
| 39 |
+
|
| 40 |
+
- **Format**: ONNX (opset 18, legacy TorchScript export)
|
| 41 |
+
- **Precision**: FP32
|
| 42 |
+
- **Input**: Spectrogram representation (STFT performed in C++ engine)
|
| 43 |
+
- **Output**: Separation masks (iSTFT performed in C++ engine)
|
| 44 |
+
- **Runtime**: ONNX Runtime 1.25.1+ with CUDA Execution Provider
|
| 45 |
+
|
| 46 |
+
The models export only the neural network portion β STFT/iSTFT operations are handled natively in C++ for optimal performance.
|
| 47 |
+
|
| 48 |
+
## Conversion
|
| 49 |
+
|
| 50 |
+
These were converted from PyTorch checkpoints using the [MSS_ONNX_TensorRT](https://github.com/ZFTurbo/MSS_ONNX_TensorRT) toolset with `dynamo=False` (legacy TorchScript exporter) for compatibility with complex attention architectures.
|
| 51 |
+
|
| 52 |
+
## Attribution & Licenses
|
| 53 |
+
|
| 54 |
+
### Training & Checkpoints
|
| 55 |
+
|
| 56 |
+
- **BS-Roformer** checkpoint by [aufr33](https://github.com/jarredou/mss-oracle-list) β trained on the Music Source Separation framework
|
| 57 |
+
- **Mel-Band RoFormer Karaoke** checkpoint by [aufr33 & viperx](https://github.com/jarredou/mss-oracle-list) β SDR 10.1956 on karaoke separation
|
| 58 |
+
- **MDX23C DrumSep** checkpoint by [aufr33 & jarredou](https://github.com/jarredou/mss-oracle-list) β drum sub-component isolation
|
| 59 |
+
- **HTDemucs** by [Meta / Facebook AI Research](https://github.com/facebookresearch/demucs) β Hybrid Transformer architecture
|
| 60 |
+
|
| 61 |
+
### Frameworks & Tools
|
| 62 |
+
|
| 63 |
+
- **[Music-Source-Separation-Training](https://github.com/ZFTurbo/Music-Source-Separation-Training)** by ZFTurbo β training framework for BS-Roformer, Mel-Band RoFormer, and MDX23C architectures
|
| 64 |
+
- **[MSS_ONNX_TensorRT](https://github.com/ZFTurbo/MSS_ONNX_TensorRT)** by ZFTurbo β ONNX conversion tooling with STFT extraction and model validation
|
| 65 |
+
- **[Demucs](https://github.com/facebookresearch/demucs)** by Meta Research β HTDemucs architecture and pre-trained weights (MIT License)
|
| 66 |
+
|
| 67 |
+
### Architecture Papers
|
| 68 |
+
|
| 69 |
+
- **BS-Roformer**: "Music Source Separation with Band-Split RoFormer" ([arXiv:2309.02612](https://arxiv.org/abs/2309.02612))
|
| 70 |
+
- **Mel-Band RoFormer**: Mel-frequency variant of Band-Split RoFormer
|
| 71 |
+
- **MDX23C**: Based on TFC-TDF-UNet v3 architecture
|
| 72 |
+
- **HTDemucs**: "Hybrid Transformers for Music Source Separation" ([arXiv:2211.08553](https://arxiv.org/abs/2211.08553))
|
| 73 |
+
|
| 74 |
+
## License
|
| 75 |
+
|
| 76 |
+
The conversion and packaging is released under MIT. Individual model weights are subject to their original training licenses β see the attribution links above for details.
|