demucs-mlx / README.md

iky1e

Add all 8 Demucs models in float32 safetensors format

d4519e2 verified 19 days ago

preview code

raw

history blame contribute delete

3.97 kB

metadata

license: mit
library_name: mlx
tags:
  - mlx
  - audio
  - music-source-separation
  - source-separation
  - demucs
  - htdemucs
  - hdemucs
  - apple-silicon
base_model: adefossez/demucs
pipeline_tag: audio-to-audio

Originally from: iky1e/demucs-mlx

Float16 variant: mlx-community/demucs-mlx-fp16

Demucs — MLX

MLX-compatible weights for all 8 pretrained Demucs models, converted to safetensors format for inference on Apple Silicon.

Demucs is a music source separation model that splits audio into stems: drums, bass, other, vocals (and guitar, piano for 6-source models).

Models

Model	What it is	Architecture	Sub-models	Sources	Weights	Tensors
`htdemucs`	Default v4 model, best speed/quality balance	HTDemucs (v4)	1	4	160 MB	573
`htdemucs_ft`	Fine-tuned v4, best overall quality	HTDemucs (v4)	4 (fine-tuned)	4	641 MB	2292
`htdemucs_6s`	6-source v4 (adds guitar + piano stems)	HTDemucs (v4)	1	6	105 MB	565
`hdemucs_mmi`	v3 hybrid, trained on more data	HDemucs (v3)	1	4	319 MB	379
`mdx`	v3 bag-of-models ensemble	Demucs + HDemucs	4 (bag)	4	1.3 GB	1298
`mdx_extra`	v3 ensemble trained on extra data	HDemucs	4 (bag)	4	1.2 GB	1516
`mdx_q`	Quantized v3 ensemble (same quality, smaller)	Demucs + HDemucs	4 (bag)	4	1.3 GB	1298
`mdx_extra_q`	Quantized v3 extra ensemble	HDemucs	4 (bag)	4	1.2 GB	1516

All models output stereo audio at 44.1 kHz.

Origin

Original model/repo: adefossez/demucs
License: MIT (same as original Demucs)
Conversion path: PyTorch checkpoints → safetensors + JSON config (direct, no intermediary)
Swift MLX port: iky1e/demucs-mlx-swift

No fine-tuning or quantization was applied — these are direct conversions of the original pretrained weights.

Files

Each model consists of two files at the repo root:

{model_name}.safetensors — model weights (float32)
{model_name}_config.json — model class, architecture config, and bag-of-models metadata

Usage

Swift (demucs-mlx-swift)

Models are downloaded automatically from this repo. No manual setup required.

# Separate a song into stems
demucs-mlx-swift -n htdemucs song.wav

# Use a specific model
demucs-mlx-swift -n htdemucs_ft song.wav

# Two-stem mode (vocals + instrumental)
demucs-mlx-swift -n htdemucs --two-stems vocals song.wav

Or use the Swift API directly:

import DemucsMLX

let separator = try DemucsSeparator(modelName: "htdemucs")
let result = try separator.separate(fileAt: URL(fileURLWithPath: "song.wav"))

Python (demucs-mlx)

pip install demucs-mlx
demucs-mlx -n htdemucs song.wav

Converting from PyTorch

To reproduce the export directly from PyTorch Demucs checkpoints:

pip install demucs safetensors numpy

# Export all 8 models
python export_from_pytorch.py --out-dir ./output

# Export specific models
python export_from_pytorch.py --models htdemucs htdemucs_ft --out-dir ./output

The conversion script (export_from_pytorch.py) is available in the demucs-mlx-swift repo under scripts/.

Citation

@inproceedings{rouard2022hybrid,
  title={Hybrid Transformers for Music Source Separation},
  author={Rouard, Simon and Massa, Francisco and Defossez, Alexandre},
  booktitle={ICASSP 23},
  year={2023}
}

@inproceedings{defossez2021hybrid,
  title={Hybrid Spectrogram and Waveform Source Separation},
  author={Defossez, Alexandre},
  booktitle={Proceedings of the ISMIR 2021 Workshop on Music Source Separation},
  year={2021}
}