scragnog commited on
Commit
08a1975
Β·
verified Β·
1 Parent(s): 39de05e

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +76 -0
README.md ADDED
@@ -0,0 +1,76 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ tags:
4
+ - music-source-separation
5
+ - stem-separation
6
+ - onnx
7
+ - audio
8
+ - hot-step
9
+ language:
10
+ - en
11
+ ---
12
+
13
+ # HOT-Step CPP SuperSep β€” ONNX Stem Separation Models
14
+
15
+ Pre-converted ONNX models for multi-stem audio separation in [HOT-Step CPP](https://github.com/scragnog/HOT-Step-CPP). These run natively via ONNX Runtime GPU β€” no Python required.
16
+
17
+ ## Models
18
+
19
+ | File | Architecture | Size | Purpose |
20
+ |------|-------------|------|---------|
21
+ | `bs_roformer_sw.onnx` | BS-Roformer | 672 MB | **Stage 1**: Primary 6-stem split (vocals, drums, bass, guitar, piano, other) |
22
+ | `mel_band_roformer_karaoke.onnx` | Mel-Band RoFormer | 875 MB | **Stage 2**: Vocal sub-separation (lead vs backing) |
23
+ | `mdx23c_drumsep.onnx` | MDX23C | 418 MB | **Stage 3**: Drum sub-separation (kick, snare, toms, hi-hat, cymbals) |
24
+ | `htdemucs_6s.onnx` | HTDemucs | 105 MB | **Stage 4**: "Other" stem refinement |
25
+
26
+ **Total: ~2.07 GB**
27
+
28
+ ## Usage
29
+
30
+ These models are designed for use with the HOT-Step CPP Model Manager. In the app:
31
+
32
+ 1. Open the **Model Manager** (click "Get More Models" in the Models dropdown)
33
+ 2. Go to the **Stem Separation** tab
34
+ 3. Click **Download** on each model (or use the Stem Separation starter pack)
35
+
36
+ Models are downloaded to `models/supersep/` and loaded automatically by the SuperSep engine.
37
+
38
+ ### Technical Details
39
+
40
+ - **Format**: ONNX (opset 18, legacy TorchScript export)
41
+ - **Precision**: FP32
42
+ - **Input**: Spectrogram representation (STFT performed in C++ engine)
43
+ - **Output**: Separation masks (iSTFT performed in C++ engine)
44
+ - **Runtime**: ONNX Runtime 1.25.1+ with CUDA Execution Provider
45
+
46
+ The models export only the neural network portion β€” STFT/iSTFT operations are handled natively in C++ for optimal performance.
47
+
48
+ ## Conversion
49
+
50
+ These were converted from PyTorch checkpoints using the [MSS_ONNX_TensorRT](https://github.com/ZFTurbo/MSS_ONNX_TensorRT) toolset with `dynamo=False` (legacy TorchScript exporter) for compatibility with complex attention architectures.
51
+
52
+ ## Attribution & Licenses
53
+
54
+ ### Training & Checkpoints
55
+
56
+ - **BS-Roformer** checkpoint by [aufr33](https://github.com/jarredou/mss-oracle-list) β€” trained on the Music Source Separation framework
57
+ - **Mel-Band RoFormer Karaoke** checkpoint by [aufr33 & viperx](https://github.com/jarredou/mss-oracle-list) β€” SDR 10.1956 on karaoke separation
58
+ - **MDX23C DrumSep** checkpoint by [aufr33 & jarredou](https://github.com/jarredou/mss-oracle-list) β€” drum sub-component isolation
59
+ - **HTDemucs** by [Meta / Facebook AI Research](https://github.com/facebookresearch/demucs) β€” Hybrid Transformer architecture
60
+
61
+ ### Frameworks & Tools
62
+
63
+ - **[Music-Source-Separation-Training](https://github.com/ZFTurbo/Music-Source-Separation-Training)** by ZFTurbo β€” training framework for BS-Roformer, Mel-Band RoFormer, and MDX23C architectures
64
+ - **[MSS_ONNX_TensorRT](https://github.com/ZFTurbo/MSS_ONNX_TensorRT)** by ZFTurbo β€” ONNX conversion tooling with STFT extraction and model validation
65
+ - **[Demucs](https://github.com/facebookresearch/demucs)** by Meta Research β€” HTDemucs architecture and pre-trained weights (MIT License)
66
+
67
+ ### Architecture Papers
68
+
69
+ - **BS-Roformer**: "Music Source Separation with Band-Split RoFormer" ([arXiv:2309.02612](https://arxiv.org/abs/2309.02612))
70
+ - **Mel-Band RoFormer**: Mel-frequency variant of Band-Split RoFormer
71
+ - **MDX23C**: Based on TFC-TDF-UNet v3 architecture
72
+ - **HTDemucs**: "Hybrid Transformers for Music Source Separation" ([arXiv:2211.08553](https://arxiv.org/abs/2211.08553))
73
+
74
+ ## License
75
+
76
+ The conversion and packaging is released under MIT. Individual model weights are subject to their original training licenses β€” see the attribution links above for details.