---
language: en
license: mit
library_name: onnxruntime
pipeline_tag: audio-to-audio
tags:
  - onnx
  - onnxruntime
  - stem-separation
  - source-separation
  - vocal-remover
  - karaoke
  - acapella
  - demucs
  - htdemucs
  - music
  - audio-to-audio
  - mobile
  - ios
  - android
  - coreml
  - directml
  - production-ready
datasets:
  - StemSplitio/stem-separation-benchmark-2026
inference: false
---

# HT-Demucs (single-file 4-stem) — ONNX

The **first ONNX export of the standard `htdemucs` (non-FT) model** on
the Hugging Face Hub. Runs in `onnxruntime` on CPU out of the box, and
on CoreML / CUDA / DirectML with a one-line provider change.
**No PyTorch required at inference.**

This repo is the single-file companion to
[`StemSplitio/htdemucs-ft-onnx`](https://huggingface.co/StemSplitio/htdemucs-ft-onnx).
You get all 4 stems out of one 316 MB `.onnx` file (`htdemucs.onnx`),
or 166 MB if you grab the fp16weights variant. The FT bag is higher
quality; this single model is ~30% faster and uses 1 session instead of 4.

---

## TL;DR

```bash
# 316 MB fp32 model:
pip install onnxruntime numpy soundfile
python infer.py your-song.mp3 ./out/ --write-all-stems
# writes ./out/{drums,bass,other,vocals}.wav at 44.1 kHz stereo

# 166 MB fp16weights variant (same runtime cost):
python infer.py your-song.mp3 ./out/ --small --write-all-stems
```

The repo contains:

- `htdemucs.onnx` — 316 MB, opset 17, parity-verified vs PyTorch fp32.
- `htdemucs_fp16weights.onnx` — 166 MB, fp16-stored weights, same runtime memory / latency.
- `infer.py` — pure-numpy reference inference (~200 lines, no torch).
- `requirements.txt` — three small packages, no PyTorch.

---

## Quality

The official `htdemucs` model is the precursor to `htdemucs_ft` — same
architecture, single set of weights instead of 4 specialist sub-models.
On MUSDB18-HQ:

| Metric | `htdemucs` (this) | `htdemucs_ft` (4-bag) |
|---|---:|---:|
| Median vocals SDR | ~8.8 dB | **9.19 dB** |
| Median drums SDR  | ~9.5 dB | **10.11 dB** |
| Total model size  | **316 MB** | 1.26 GB |
| Sessions to load  | **1**     | 4 |
| Speed vs the bag  | **~1.4× faster** | baseline |

Parity vs PyTorch fp32 (random input, 7.8 s segment):

- `htdemucs.onnx` max abs diff: **6.62 × 10⁻⁴**
- `htdemucs_fp16weights.onnx` max abs diff (vs fp32 weights): **4.6 × 10⁻⁵**

Both well within the 1e-3 publish threshold.

---

## Performance

Single 7.8 s segment, Apple M4 Pro CPU:

| Variant | RAM | Latency | RTF |
|---|---:|---:|---:|
| `htdemucs.onnx` (fp32) | ~1.1 GB | ~1.6 s | 0.20 |
| `htdemucs_fp16weights.onnx` | ~1.1 GB | ~1.6 s | 0.20 |
| For comparison: `htdemucs_ft` (4-session bag) | ~4.0 GB | ~6.4 s | 0.49 |

CUDA / DirectML / CoreML EPs are typically ≥ 5× faster on real GPUs.

---

## Quick start

### Python

```python
import soundfile as sf
import infer

audio, sr = sf.read("your-song.mp3", dtype="float32", always_2d=True)
stems = infer.separate(audio.T, sr,
                       model_path=infer.DEFAULT_MODEL,
                       providers=["CPUExecutionProvider"])
for stem, arr in stems.items():
    sf.write(f"{stem}.wav", arr.T, sr)
```

### CLI

```bash
python infer.py your-song.mp3 ./out/ --write-all-stems
python infer.py your-song.mp3 ./out/ --providers coreml   # macOS arm64
python infer.py your-song.mp3 ./out/ --providers cuda     # Linux + NVIDIA
python infer.py your-song.mp3 ./out/ --providers dml      # Windows + DX12
python infer.py your-song.mp3 ./out/ --small              # 166 MB variant
```

### Mobile / Web (after pip install `onnxruntime-mobile` or `onnxruntime-web`)

```swift
// iOS / Swift
import onnxruntime_objc
let opts = try ORTSessionOptions()
try opts.appendCoreMLExecutionProvider(with: ORTCoreMLExecutionProviderOptions())
let session = try ORTSession(env: env,
    modelPath: Bundle.main.path(forResource: "htdemucs", ofType: "onnx")!,
    sessionOptions: opts)
```

```js
// Browser / web
import * as ort from "onnxruntime-web";
const sess = await ort.InferenceSession.create("htdemucs_fp16weights.onnx", {
  executionProviders: ["wasm"],
});
const t = new ort.Tensor("float32", audioBuffer, [1, 2, 343980]);
const out = await sess.run({ mix: t });   // out.stems is (1, 4, 2, 343980)
```

For a turnkey browser demo with file-picker + chunked overlap-add, see
[`demucs-onnx browser-demo`](https://github.com/StemSplit/demucs-onnx#browser-demos).

---

## Input / output spec

| Tensor | Name | Shape | Dtype | Notes |
|---|---|---|---|---|
| Input | `mix` | `(1, 2, 343980)` | float32 | Stereo, 44.1 kHz, 7.8 s segment. Values in [-1, 1]. |
| Output | `stems` | `(1, 4, 2, 343980)` | float32 | Stems in order `[drums, bass, other, vocals]`. All 4 are real predictions (unlike the FT specialists). |

For longer audio, chunk with overlap-add — see `infer.py::separate` for
a working 60-line implementation.

---

## Tooling — `demucs-onnx` Python package

This model can be run (and re-exported from PyTorch) via the open-source
[`demucs-onnx`](https://github.com/StemSplit/demucs-onnx) Python package
on PyPI. It auto-downloads from this repo on first use, so you don't
have to clone or wrangle file paths.

```bash
pip install demucs-onnx

# Single-file 4-stem flavor (this repo):
demucs-onnx separate song.mp3 stems/ --model htdemucs

# Python API:
python -c "from demucs_onnx import separate; \
  print(separate('song.mp3', model='htdemucs').keys())"
```

To re-export your own fine-tune:

```bash
pip install 'demucs-onnx[export]'
demucs-onnx export htdemucs out/htdemucs.onnx
```

---

## How it was built

The export pipeline lives in the open-source
[`demucs-onnx`](https://github.com/StemSplit/demucs-onnx) package at
[`demucs_onnx/export/`](https://github.com/StemSplit/demucs-onnx/tree/main/src/demucs_onnx/export).
It applies four patches to make `torch.onnx.export` work on htdemucs:

1. Complex-typed `torch.stft` outputs → `Conv1d` with sin/cos kernels.
2. `model.segment` `fractions.Fraction` → plain `float`.
3. `random.randrange` in transformer pos-embedding → hardcoded `shift=0`.
4. `aten::_native_multi_head_attention` (no ONNX symbolic) → drop-in
   `nn.MultiheadAttention.forward` built from `Linear`/`bmm`/`softmax`.

These are the four blockers every previous community attempt at "demucs
onnx" stalled on. See the [README of the demucs-onnx package](https://github.com/StemSplit/demucs-onnx#the-4-blockers-explained)
for the full write-up with code references.

---

## Related work

Sibling ONNX repos from the same export pipeline:

| Repo | Format | Stems | Use when |
|---|---|---|---|
| `htdemucs-onnx` *(this)* | Single file | 4 | Faster startup, fewer sessions, ~30% lower latency than the FT bag. |
| [`htdemucs-ft-onnx`](https://huggingface.co/StemSplitio/htdemucs-ft-onnx) | Bag of 4 files | 4 | Best SDR, especially on vocals. The default in StemSplit production. |
| [`htdemucs-6s-onnx`](https://huggingface.co/StemSplitio/htdemucs-6s-onnx) | Single file | 6 | Need guitar + piano stems on top of the standard 4. |
| [`htdemucs-ft-{drums,bass,other,vocals}-onnx`](https://huggingface.co/StemSplitio) | Single specialist | 1 | Fastest single-stem inference; 4× faster than the bag. |

Full benchmark across every popular open-source separator:
[StemSplitio/stem-separation-benchmark-2026](https://huggingface.co/datasets/StemSplitio/stem-separation-benchmark-2026).

---

## Skip the infrastructure — use the StemSplit API

Don't want to bundle a 316 MB model in your app, manage a GPU pool, or
write overlap-add chunking? Use the **[StemSplit API](https://stemsplit.io/developers)**
instead — same model under the hood, hosted for you, with credits and a
dashboard.

- 🌐 [stemsplit.io](https://stemsplit.io)
- 📘 [Developer docs](https://stemsplit.io/developers/docs)
- 🔌 [API reference](https://stemsplit.io/developers/reference)

Or use the no-code tools that ship the same model family:

- 🎤 [Vocal Remover](https://stemsplit.io/vocal-remover)
- 🎶 [Karaoke Maker](https://stemsplit.io/karaoke-maker)
- 🎙️ [Acapella Maker](https://stemsplit.io/acapella-maker)
- 📺 [YouTube Stem Splitter](https://stemsplit.io/youtube-stem-splitter)

---

## License & attribution

This repo is **MIT-licensed**, matching the original HT-Demucs.

```bibtex
@inproceedings{rouard2023hybrid,
  title     = {Hybrid Transformers for Music Source Separation},
  author    = {Rouard, Simon and Massa, Francisco and D{\'e}fossez, Alexandre},
  booktitle = {ICASSP},
  year      = {2023}
}
```

- Original PyTorch model: [`facebookresearch/demucs`](https://github.com/facebookresearch/demucs)
- ONNX export, parity verification, and packaging by [StemSplit](https://stemsplit.io)
- Search keywords: **htdemucs onnx**, **demucs onnx single file**, **demucs ios**,
  **demucs android**, **music source separation onnx**, **stem separation mobile**.