Update README.md
Browse files
README.md
CHANGED
|
@@ -1,34 +1,34 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: mit
|
| 3 |
-
tags:
|
| 4 |
-
- audio
|
| 5 |
-
- music-source-separation
|
| 6 |
-
- sound-separation
|
| 7 |
-
- demucs
|
| 8 |
-
- htdemucs
|
| 9 |
-
- stem-separation
|
| 10 |
-
- inference
|
| 11 |
-
pipeline_tag: audio-to-audio
|
| 12 |
-
---
|
| 13 |
-
|
| 14 |
-
## Music Source Separation
|
| 15 |
-
|
| 16 |
-
This is the Demucs
|
| 17 |
-
|
| 18 |
-
---
|
| 19 |
-
|
| 20 |
-
## What is HTDemucs?
|
| 21 |
-
|
| 22 |
-
[HTDemucs (Hybrid Transformer Demucs)](https://github.com/facebookresearch/demucs) is Meta AI's fourth-generation music source separation model, introduced in [*Hybrid Transformers for Music Source Separation* (Rouard et al., ICASSP 2023)](https://arxiv.org/abs/2211.08553).
|
| 23 |
-
|
| 24 |
-
Where earlier Demucs generations processed audio purely in the time domain, HTDemucs runs **two parallel encoders simultaneously** — one operating on the raw waveform, the other on the STFT spectrogram — with a **Transformer Encoder with cross-attention** at the bottleneck connecting them. This lets the model correlate time-domain and frequency-domain features before decoding, yielding measurably better separation quality — especially on spectrally complex, temporally sparse instruments like piano and guitar.
|
| 25 |
-
|
| 26 |
-
The `htdemucs_6s` variant adds dedicated guitar and piano stems on top of the standard drums/bass/other/vocals quad, making it the most capable publicly available separation model for music production use.
|
| 27 |
-
|
| 28 |
-
---
|
| 29 |
-
|
| 30 |
-
From Facebook research:
|
| 31 |
-
|
| 32 |
-
Demucs is based on U-Net convolutional architecture inspired by Wave-U-Net and SING, with GLUs, a BiLSTM between the encoder and decoder, specific initialization of weights and transposed convolutions in the decoder.
|
| 33 |
-
|
| 34 |
See [facebookresearch's repository](https://github.com/facebookresearch/demucs) for more information on Demucs.
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
tags:
|
| 4 |
+
- audio
|
| 5 |
+
- music-source-separation
|
| 6 |
+
- sound-separation
|
| 7 |
+
- demucs
|
| 8 |
+
- htdemucs
|
| 9 |
+
- stem-separation
|
| 10 |
+
- inference
|
| 11 |
+
pipeline_tag: audio-to-audio
|
| 12 |
+
---
|
| 13 |
+
|
| 14 |
+
## Music Source Separation
|
| 15 |
+
|
| 16 |
+
This is the Demucs v4 models from Facebook Research.
|
| 17 |
+
|
| 18 |
+
---
|
| 19 |
+
|
| 20 |
+
## What is HTDemucs?
|
| 21 |
+
|
| 22 |
+
[HTDemucs (Hybrid Transformer Demucs)](https://github.com/facebookresearch/demucs) is Meta AI's fourth-generation music source separation model, introduced in [*Hybrid Transformers for Music Source Separation* (Rouard et al., ICASSP 2023)](https://arxiv.org/abs/2211.08553).
|
| 23 |
+
|
| 24 |
+
Where earlier Demucs generations processed audio purely in the time domain, HTDemucs runs **two parallel encoders simultaneously** — one operating on the raw waveform, the other on the STFT spectrogram — with a **Transformer Encoder with cross-attention** at the bottleneck connecting them. This lets the model correlate time-domain and frequency-domain features before decoding, yielding measurably better separation quality — especially on spectrally complex, temporally sparse instruments like piano and guitar.
|
| 25 |
+
|
| 26 |
+
The `htdemucs_6s` variant adds dedicated guitar and piano stems on top of the standard drums/bass/other/vocals quad, making it the most capable publicly available separation model for music production use.
|
| 27 |
+
|
| 28 |
+
---
|
| 29 |
+
|
| 30 |
+
From Facebook research:
|
| 31 |
+
|
| 32 |
+
Demucs is based on U-Net convolutional architecture inspired by Wave-U-Net and SING, with GLUs, a BiLSTM between the encoder and decoder, specific initialization of weights and transposed convolutions in the decoder.
|
| 33 |
+
|
| 34 |
See [facebookresearch's repository](https://github.com/facebookresearch/demucs) for more information on Demucs.
|