File size: 2,650 Bytes
e6fd658
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
---
license: mit
tags:
  - audio
  - audio-separation
  - stem-separation
  - demucs
  - htdemucs
  - safetensors
  - maestraea
pipeline_tag: audio-to-audio
---

# HTDemucs Models (Safetensors)

**4/6-Stem Source Separation — Vocals, Drums, Bass, Other (+Guitar, Piano)**

[Original Source](https://github.com/facebookresearch/demucs) by [Facebook Research](https://github.com/facebookresearch) · MIT License

> Converted from the original `.th` checkpoint format to safetensors for faster loading and safer deserialization. For use with [Mæstræa AI Workstation](https://github.com/AEmotionStudio/Maestraea).

## Available Models

| File | Stems | Size | Description |
|------|-------|------|-------------|
| `htdemucs.safetensors` | 4 (drums, bass, other, vocals) | 84 MB | Base model |
| `htdemucs_ft.safetensors` | 4 (drums, bass, other, vocals) | 84 MB | **Fine-tuned** — best quality ⭐ |
| `htdemucs_6s.safetensors` | 6 (drums, bass, other, vocals, guitar, piano) | 55 MB | 6-stem variant |

Each model has a matching `*_config.json` with architecture parameters (sources, sample rate, channels).

## What HTDemucs Does

HTDemucs (Hybrid Transformer Demucs) separates mixed audio into individual stems:

- **Vocals** — Singing, spoken word
- **Drums** — Percussion, kick, snare, hi-hat
- **Bass** — Bass guitar, synth bass
- **Other** — Everything else (keys, synths, FX)
- **Guitar** — (6-stem model only)
- **Piano** — (6-stem model only)

### Key Features

- Real-time capable on GPU
- Adjustable segment size for VRAM control
- Best-in-class separation quality (htdemucs_ft)
- ~4–6 GB VRAM

## Original Checkpoint URLs

These safetensors were converted from:

| Model | Original URL |
|-------|-------------|
| htdemucs | `https://dl.fbaipublicfiles.com/demucs/hybrid_transformer/955717e8-8726e21a.th` |
| htdemucs_ft | `https://dl.fbaipublicfiles.com/demucs/hybrid_transformer/04573f0d-f3cf25b2.th` |
| htdemucs_6s | `https://dl.fbaipublicfiles.com/demucs/hybrid_transformer/5c90dfd2-34c22ccb.th` |

## Usage with Mæstræa

These models are automatically downloaded by the Mæstræa AI Workstation backend. They can also be used directly with the `demucs` library:

```python
from demucs.pretrained import get_model
model = get_model("htdemucs_ft")
```

## License

MIT — same as the original Demucs release.

## Credits

- **Model**: [Facebook Research / Meta AI](https://github.com/facebookresearch/demucs)
- **Paper**: [Hybrid Transformers for Music Source Separation](https://arxiv.org/abs/2211.08553) (Rouard et al., 2023)
- **Conversion & Mirror by**: [AEmotionStudio](https://huggingface.co/AEmotionStudio)