Duplicated from iBoostAI/Demucs-v4

karawoot
/

Demucs

music-source-separation

sound-separation

stem-separation

Model card Files Files and versions

Demucs / README.md

karawoot's picture

Duplicate from iBoostAI/Demucs-v4

cb688a8 about 23 hours ago

|

history blame contribute delete

1.65 kB

	---
	license: mit
	tags:
	- audio
	- music-source-separation
	- sound-separation
	- demucs
	- htdemucs
	- stem-separation
	- inference
	pipeline_tag: audio-to-audio
	---

	## Music Source Separation

	This is the Demucs v4 models from Facebook Research.

	---

	## What is HTDemucs?

	[HTDemucs (Hybrid Transformer Demucs)](https://github.com/facebookresearch/demucs) is Meta AI's fourth-generation music source separation model, introduced in [Hybrid Transformers for Music Source Separation (Rouard et al., ICASSP 2023)](https://arxiv.org/abs/2211.08553).

	Where earlier Demucs generations processed audio purely in the time domain, HTDemucs runs two parallel encoders simultaneously — one operating on the raw waveform, the other on the STFT spectrogram — with a Transformer Encoder with cross-attention at the bottleneck connecting them. This lets the model correlate time-domain and frequency-domain features before decoding, yielding measurably better separation quality — especially on spectrally complex, temporally sparse instruments like piano and guitar.

	The `htdemucs_6s` variant adds dedicated guitar and piano stems on top of the standard drums/bass/other/vocals quad, making it the most capable publicly available separation model for music production use.

	---

	From Facebook research:

	Demucs is based on U-Net convolutional architecture inspired by Wave-U-Net and SING, with GLUs, a BiLSTM between the encoder and decoder, specific initialization of weights and transposed convolutions in the decoder.

	See [facebookresearch's repository](https://github.com/facebookresearch/demucs) for more information on Demucs.