htdemucs_ft β WebGPU / onnxruntime-web build
The htdemucs_ft 4-model fine-tuned ensemble (Meta's Demucs v4), exported to ONNX
so it runs in the browser on WebGPU via onnxruntime-web β no Python, no server.
Built for loukai's in-browser karaoke creator.
What makes this different from other htdemucs ONNX
There are several htdemucs ONNX exports on the Hub already, but they're CUDA/CPU
server exports β they fail to load on the onnxruntime-web WebGPU execution
provider (in-graph STFT + many ScatterND ops the WebGPU EP can't place; verified:
session creation throws in transformer_memcpy). This build is shaped specifically
for the browser:
- STFT/iSTFT pulled out of the graph (done in JS), using the real-magnitude input
contract:
mix [1,2,343980]+mag [1,4,2048,336]βx [1,4,4,2048,336](freq mask) +xt [1,4,2,343980](time). Masking is applied in JS (seedemucs-web). - fp16 weights for speed/size β with the variance/normalization prologue pinned
to CPU (
forceCpuNodeNames) because that op overflows fp16 on WebGPU and NaNs. fp16 is parity-perfect vs fp32 (corr ~1.0). - Legacy
torch.onnxexport (opset 17, no dynamo) β the dynamo path decomposes ops in ways that NaN on WebGPU.
Files
htdemucs_ft_{drums,bass,other,vocals}_safe16.onnxβ the 4 specialist models (~84 MB each, fp16). Stem k is taken from model k (the bag's one-hot weights).ft_cpu_nodes.jsonβ per-stemforceCpuNodeNameslists.
Usage
Runs via loukai-htdemucs-ft ensemble runner
(createEnsembleSessions / runEnsemble) on top of demucs-web for the STFT. See
the loukai repo for the full in-browser pipeline (Demucs + Whisper + CREPE, all
WebGPU).
Credit
Models exported from Demucs
(htdemucs_ft, MIT). Export approach builds on the timcsy / gianlourbano
demucs-web-onnx work.