bgkb commited on
Commit
d3ba0ea
Β·
verified Β·
1 Parent(s): 1a0d32f

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +100 -0
README.md ADDED
@@ -0,0 +1,100 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ tags:
4
+ - music-source-separation
5
+ - vocal-separation
6
+ - onnx
7
+ - webgpu
8
+ - audio
9
+ pipeline_tag: audio-to-audio
10
+ library_name: onnxruntime
11
+ base_model: ZFTurbo/Music-Source-Separation-Training
12
+ ---
13
+
14
+ # BS PolarFormer – ONNX Vocal Separation
15
+
16
+ ONNX conversion of the **BS PolarFormer** vocal separation model from
17
+ [Music-Source-Separation-Training](https://github.com/ZFTurbo/Music-Source-Separation-Training).
18
+
19
+ BS PolarFormer is a BSRoformer architecture with **PoPE** (Polar Positional Embeddings)
20
+ instead of rotary embeddings. It separates **vocals** from **other** (instrumental) in stereo audio at 44.1kHz.
21
+
22
+ ## Files
23
+
24
+ | File | Size | Description |
25
+ |------|------|-------------|
26
+ | `bs_polarformer.onnx` | 201 MB | FP32 ONNX model (core: band split β†’ transformers β†’ mask estimator) |
27
+ | `bs_polarformer_fp16.onnx` | 103 MB | FP16 quantized (weights stored as float16, ~same quality) |
28
+ | `model_bs_polarformer_float16.yaml` | 3.6 KB | Model config |
29
+ | `convert_to_onnx.py` | 19 KB | Conversion script (PyTorch β†’ ONNX) |
30
+ | `run_onnx_inference.py` | 7 KB | CLI inference script |
31
+ | `index.html` | 18 KB | Web app (runs in browser via WebGPU/WASM) |
32
+
33
+ ## Architecture
34
+
35
+ The ONNX model contains only the **core neural network** (51M parameters):
36
+
37
+ ```
38
+ Audio β†’ [STFT] β†’ Core Model (ONNX) β†’ [Mask] β†’ [iSTFT] β†’ Vocals
39
+ β”œβ”€ BandSplit (60 frequency bands)
40
+ β”œβ”€ 12Γ— (TimeTransformer + FreqTransformer)
41
+ β”‚ └─ 8-head attention, dim=256, PoPE embeddings
42
+ └─ MaskEstimator (2-layer MLP per band)
43
+ ```
44
+
45
+ STFT/iSTFT are handled outside the ONNX model (in PyTorch or JavaScript).
46
+
47
+ **Input:** `(batch, time_frames, 4100)` β€” interleaved stereo STFT features (1025 freq Γ— 2 channels Γ— 2 real/imag)
48
+
49
+ **Output:** `(batch, 1, 2050, time_frames, 2)` β€” complex mask
50
+
51
+ ## Quality (vs PyTorch reference)
52
+
53
+ | | FP32 ONNX | FP16 ONNX |
54
+ |---|---|---|
55
+ | Mask max abs diff | ~1e-7 | ~4e-5 |
56
+ | Audio SNR | 107 dB | 48.6 dB |
57
+ | Pearson correlation | 1.00000000 | 0.99999642 |
58
+ | Model size | 201 MB | 103 MB |
59
+
60
+ Both are perceptually identical to the PyTorch model. The original model achieves **SDR 11.00** on vocals (Multisong Dataset).
61
+
62
+ ## Usage
63
+
64
+ ### Python (ONNX Runtime)
65
+
66
+ ```bash
67
+ pip install onnxruntime librosa soundfile pyyaml einops torch
68
+
69
+ # Download this repo, then:
70
+ python run_onnx_inference.py song.mp3 --output_dir output/
71
+ python run_onnx_inference.py song.mp3 --fp16 # use smaller model
72
+ ```
73
+
74
+ ### Browser (WebGPU)
75
+
76
+ Serve the files with any HTTP server and open `index.html`:
77
+
78
+ ```bash
79
+ python -m http.server 8080
80
+ # Open http://localhost:8080
81
+ ```
82
+
83
+ Drop an audio file, select FP32 or FP16, and click "Separate Vocals". Uses WebGPU when available, falls back to WASM.
84
+
85
+ ### Convert from scratch
86
+
87
+ ```bash
88
+ # Download checkpoint
89
+ wget https://github.com/ZFTurbo/Music-Source-Separation-Training/releases/download/v1.0.20/model_bs_polarformer_float16.ckpt
90
+
91
+ # Convert
92
+ python convert_to_onnx.py # FP32 only
93
+ python convert_to_onnx.py --fp16 # FP32 + FP16
94
+ ```
95
+
96
+ ## Credits
97
+
98
+ - Original model & training: [ZFTurbo/Music-Source-Separation-Training](https://github.com/ZFTurbo/Music-Source-Separation-Training)
99
+ - BSRoformer architecture: [lucidrains](https://github.com/lucidrains)
100
+ - PoPE embeddings: [PoPE_pytorch](https://pypi.org/project/PoPE-pytorch/)