iky1e commited on
Commit
908d2d0
·
verified ·
1 Parent(s): 4da88ac

Add all 8 Demucs models in float16 safetensors format

Browse files
README.md ADDED
@@ -0,0 +1,127 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ library_name: mlx
4
+ tags:
5
+ - mlx
6
+ - audio
7
+ - music-source-separation
8
+ - source-separation
9
+ - demucs
10
+ - htdemucs
11
+ - hdemucs
12
+ - apple-silicon
13
+ - float16
14
+ base_model: adefossez/demucs
15
+ pipeline_tag: audio-to-audio
16
+ ---
17
+
18
+ > Originally from: [iky1e/demucs-mlx-fp16](https://huggingface.co/iky1e/demucs-mlx-fp16)
19
+ >
20
+ > Float32 variant: [mlx-community/demucs-mlx](https://huggingface.co/mlx-community/demucs-mlx)
21
+
22
+ # Demucs — MLX (float16)
23
+
24
+ Float16 MLX-compatible weights for all 8 pretrained [Demucs](https://github.com/adefossez/demucs) models, converted to `safetensors` format for inference on Apple Silicon.
25
+
26
+ This is the **float16 variant** of [iky1e/demucs-mlx](https://huggingface.co/iky1e/demucs-mlx) — same models, half the file size, identical output quality. Recommended for Apple Silicon where memory is constrained (iOS, smaller Macs).
27
+
28
+ Demucs is a music source separation model that splits audio into stems: `drums`, `bass`, `other`, `vocals` (and `guitar`, `piano` for 6-source models).
29
+
30
+ ## Models
31
+
32
+ | Model | What it is | Architecture | Sub-models | Sources | Weights (fp16) | Weights (fp32) |
33
+ |-------|-----------|-------------|-----------|---------|----------------|----------------|
34
+ | `htdemucs` | Default v4 model, best speed/quality balance | HTDemucs (v4) | 1 | 4 | 80 MB | 160 MB |
35
+ | `htdemucs_ft` | Fine-tuned v4, best overall quality | HTDemucs (v4) | 4 (fine-tuned) | 4 | 321 MB | 641 MB |
36
+ | `htdemucs_6s` | 6-source v4 (adds guitar + piano stems) | HTDemucs (v4) | 1 | 6 | 52 MB | 105 MB |
37
+ | `hdemucs_mmi` | v3 hybrid, trained on more data | HDemucs (v3) | 1 | 4 | 160 MB | 319 MB |
38
+ | `mdx` | v3 bag-of-models ensemble | Demucs + HDemucs | 4 (bag) | 4 | 659 MB | 1.3 GB |
39
+ | `mdx_extra` | v3 ensemble trained on extra data | HDemucs | 4 (bag) | 4 | 638 MB | 1.2 GB |
40
+ | `mdx_q` | Quantized v3 ensemble (same quality, smaller) | Demucs + HDemucs | 4 (bag) | 4 | 659 MB | 1.3 GB |
41
+ | `mdx_extra_q` | Quantized v3 extra ensemble | HDemucs | 4 (bag) | 4 | 638 MB | 1.2 GB |
42
+
43
+ All models output stereo audio at 44.1 kHz.
44
+
45
+ ## Float16 vs Float32
46
+
47
+ Output quality is **identical** — max sample difference is 3.1e-5 (one int16 LSB), correlation > 0.999999999. MLX on Apple Silicon upcasts float16 weights to float32 for computation, so the math is the same.
48
+
49
+ | Metric | float32 ([iky1e/demucs-mlx](https://huggingface.co/iky1e/demucs-mlx)) | float16 (this repo) |
50
+ |--------|---------|---------|
51
+ | htdemucs file size | 160 MB | **80 MB** |
52
+ | htdemucs RSS (peak memory) | 1311 MB | **1210 MB** |
53
+ | htdemucs speed (M1 Pro) | 7.1s | 7.9s |
54
+ | Output quality | reference | identical |
55
+
56
+ ## Origin
57
+
58
+ - Original model/repo: [adefossez/demucs](https://github.com/adefossez/demucs)
59
+ - Float32 weights: [iky1e/demucs-mlx](https://huggingface.co/iky1e/demucs-mlx)
60
+ - License: MIT (same as original Demucs)
61
+ - Conversion path: PyTorch checkpoints → safetensors float32 → float16
62
+ - Swift MLX port: [kylehowells/demucs-mlx-swift](https://github.com/kylehowells/demucs-mlx-swift)
63
+
64
+ ## Files
65
+
66
+ Each model consists of two files at the repo root:
67
+
68
+ - `{model_name}.safetensors` — model weights (float16)
69
+ - `{model_name}_config.json` — model class, architecture config, and bag-of-models metadata
70
+
71
+ ## Usage
72
+
73
+ ### Swift (demucs-mlx-swift)
74
+
75
+ Point the model directory or repo to this float16 variant:
76
+
77
+ ```bash
78
+ # Use float16 models from local directory
79
+ demucs-mlx-swift -n htdemucs --model-dir /path/to/demucs-mlx-fp16 song.wav
80
+
81
+ # Or set the HF repo environment variable
82
+ export DEMUCS_MLX_SWIFT_MODEL_REPO=iky1e/demucs-mlx-fp16
83
+ demucs-mlx-swift -n htdemucs song.wav
84
+ ```
85
+
86
+ Or use the Swift API directly:
87
+
88
+ ```swift
89
+ import DemucsMLX
90
+
91
+ let separator = try DemucsSeparator(modelName: "htdemucs")
92
+ let result = try separator.separate(fileAt: URL(fileURLWithPath: "song.wav"))
93
+ ```
94
+
95
+ ## Converting from PyTorch
96
+
97
+ To reproduce the export directly from PyTorch Demucs checkpoints:
98
+
99
+ ```bash
100
+ pip install demucs safetensors numpy
101
+
102
+ # Export all 8 models as float16 (default)
103
+ python export_from_pytorch.py --out-dir ./output
104
+
105
+ # Export as float32
106
+ python export_from_pytorch.py --out-dir ./output --dtype float32
107
+ ```
108
+
109
+ The conversion script (`export_from_pytorch.py`) is available in the [demucs-mlx-swift](https://github.com/kylehowells/demucs-mlx-swift) repo under `scripts/`.
110
+
111
+ ## Citation
112
+
113
+ ```bibtex
114
+ @inproceedings{rouard2022hybrid,
115
+ title={Hybrid Transformers for Music Source Separation},
116
+ author={Rouard, Simon and Massa, Francisco and Defossez, Alexandre},
117
+ booktitle={ICASSP 23},
118
+ year={2023}
119
+ }
120
+
121
+ @inproceedings{defossez2021hybrid,
122
+ title={Hybrid Spectrogram and Waveform Source Separation},
123
+ author={Defossez, Alexandre},
124
+ booktitle={Proceedings of the ISMIR 2021 Workshop on Music Source Separation},
125
+ year={2021}
126
+ }
127
+ ```
export_from_pytorch.py ADDED
@@ -0,0 +1,472 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Export Demucs PyTorch models directly to safetensors + JSON config for Swift MLX.
4
+
5
+ Converts all 8 pretrained models directly from the original PyTorch demucs package.
6
+ No dependency on demucs-mlx or any other re-implementation.
7
+
8
+ Usage:
9
+ # Export all models
10
+ python scripts/export_from_pytorch.py --out-dir ~/.cache/demucs-mlx-swift-models
11
+
12
+ # Export specific models
13
+ python scripts/export_from_pytorch.py --models htdemucs htdemucs_ft --out-dir ./Models
14
+
15
+ Requirements:
16
+ pip install demucs safetensors numpy
17
+ """
18
+ from __future__ import annotations
19
+
20
+ import argparse
21
+ import inspect
22
+ import json
23
+ import re
24
+ import sys
25
+ from fractions import Fraction
26
+ from pathlib import Path
27
+
28
+ import numpy as np
29
+ import torch
30
+
31
+ ALL_MODELS = [
32
+ "htdemucs",
33
+ "htdemucs_ft",
34
+ "htdemucs_6s",
35
+ "hdemucs_mmi",
36
+ "mdx",
37
+ "mdx_extra",
38
+ "mdx_q",
39
+ "mdx_extra_q",
40
+ ]
41
+
42
+ # Map PyTorch class names to MLX class names used by Swift loader
43
+ CLASS_MAP = {
44
+ "Demucs": "DemucsMLX",
45
+ "HDemucs": "HDemucsMLX",
46
+ "HTDemucs": "HTDemucsMLX",
47
+ }
48
+
49
+ # Conv-like layer names that get .conv. wrapper in MLX
50
+ CONV_LAYER_NAMES = {
51
+ "conv", "conv_tr", "rewrite",
52
+ "channel_upsampler", "channel_downsampler",
53
+ "channel_upsampler_t", "channel_downsampler_t",
54
+ }
55
+
56
+ # DConv attention sub-module names (LocalState)
57
+ DCONV_ATTN_NAMES = {"content", "key", "query", "proj", "query_decay", "query_freqs"}
58
+
59
+
60
+ def to_json_serializable(obj):
61
+ """Convert Python objects to JSON-serializable types."""
62
+ if isinstance(obj, Fraction):
63
+ return f"{obj.numerator}/{obj.denominator}"
64
+ if isinstance(obj, torch.Tensor):
65
+ return obj.item() if obj.numel() == 1 else obj.tolist()
66
+ if isinstance(obj, np.ndarray):
67
+ return obj.tolist()
68
+ if isinstance(obj, (list, tuple)):
69
+ return [to_json_serializable(x) for x in obj]
70
+ if isinstance(obj, dict):
71
+ return {str(k): to_json_serializable(v) for k, v in obj.items()}
72
+ return obj
73
+
74
+
75
+ def transpose_conv_weights(key: str, value: np.ndarray, is_conv_transpose: bool = False) -> np.ndarray:
76
+ """Transpose PyTorch conv weights to MLX layout.
77
+
78
+ Conv1d: (out, in, k) → MLX: (out, k, in) transpose (0,2,1)
79
+ Conv2d: (out, in, h, w) → MLX: (out, h, w, in) transpose (0,2,3,1)
80
+ ConvTranspose1d: (in, out, k) → MLX: (out, k, in) transpose (1,2,0)
81
+ ConvTranspose2d: (in, out, h, w) → MLX: (out, h, w, in) transpose (1,2,3,0)
82
+ """
83
+ if not key.endswith(".weight"):
84
+ return value
85
+
86
+ if len(value.shape) == 3:
87
+ return np.transpose(value, (1, 2, 0) if is_conv_transpose else (0, 2, 1))
88
+ if len(value.shape) == 4:
89
+ return np.transpose(value, (1, 2, 3, 0) if is_conv_transpose else (0, 2, 3, 1))
90
+ return value
91
+
92
+
93
+ def remap_key(
94
+ key: str,
95
+ value: np.ndarray,
96
+ model_type: str = "HTDemucs",
97
+ dconv_conv_slots: set | None = None,
98
+ seq_conv_slots: set | None = None,
99
+ ) -> list[tuple[str, np.ndarray]]:
100
+ """Remap a PyTorch state dict key to MLX key convention.
101
+
102
+ Returns a list of (key, value) pairs (multiple for attention in_proj splits).
103
+ Duplicate target keys (e.g. LSTM bias_ih + bias_hh) are merged by the caller.
104
+
105
+ Args:
106
+ key: PyTorch state dict key
107
+ value: numpy array (already transposed for conv weights)
108
+ model_type: PyTorch class name ("Demucs", "HDemucs", "HTDemucs")
109
+ dconv_conv_slots: set of (block_prefix, slot_str) for DConv slots with 3D weights
110
+ seq_conv_slots: set of (enc_dec, layer, slot) for Demucs v1/v2 Sequential Conv slots
111
+ """
112
+ dconv_conv_slots = dconv_conv_slots or set()
113
+ seq_conv_slots = seq_conv_slots or set()
114
+
115
+ # =========================================================================
116
+ # Step 1: Demucs v1/v2 Sequential insertion
117
+ # encoder.{i}.{j}.rest → encoder.{i}.layers.{j}.rest
118
+ # decoder.{i}.{j}.rest → decoder.{i}.layers.{j}.rest
119
+ # =========================================================================
120
+ if model_type == "Demucs":
121
+ m = re.match(r"(encoder|decoder)\.(\d+)\.(\d+)(\..*)?$", key)
122
+ if m:
123
+ enc_dec, layer, slot, rest = m.groups()
124
+ rest = rest or ""
125
+ key = f"{enc_dec}.{layer}.layers.{slot}{rest}"
126
+
127
+ # =========================================================================
128
+ # Step 1.5: Demucs v1/v2 Sequential Conv/Norm slot wrapping
129
+ # encoder.{i}.layers.{j}.weight → encoder.{i}.layers.{j}.conv.weight (if Conv slot)
130
+ # =========================================================================
131
+ if model_type == "Demucs":
132
+ m = re.match(r"(encoder|decoder)\.(\d+)\.layers\.(\d+)\.(weight|bias)$", key)
133
+ if m:
134
+ enc_dec, layer, slot, param = m.groups()
135
+ if (enc_dec, layer, slot) in seq_conv_slots:
136
+ return [(f"{enc_dec}.{layer}.layers.{slot}.conv.{param}", value)]
137
+ else:
138
+ return [(f"{enc_dec}.{layer}.layers.{slot}.{param}", value)]
139
+
140
+ # =========================================================================
141
+ # Step 2: DConv internal slot handling
142
+ # Matches: *.layers.{block_idx}.{slot_idx}.{rest}
143
+ # Both HDemucs (.dconv.layers.) and Demucs v1/v2 (.layers.{N}.layers.) end
144
+ # with this pattern after Step 1.
145
+ # =========================================================================
146
+ m = re.match(r"(.+\.layers\.\d+)\.(\d+)\.(.+)$", key)
147
+ if m:
148
+ block_prefix = m.group(1)
149
+ slot = m.group(2)
150
+ rest = m.group(3)
151
+
152
+ # --- 2a. Simple weight/bias/scale ---
153
+ if rest in ("weight", "bias", "scale"):
154
+ if rest == "weight" and len(value.shape) >= 2:
155
+ # 3D weight = Conv1d → add .conv.
156
+ return [(f"{block_prefix}.layers.{slot}.conv.{rest}", value)]
157
+ elif rest == "weight":
158
+ # 1D weight = GroupNorm → no wrapper
159
+ return [(f"{block_prefix}.layers.{slot}.{rest}", value)]
160
+ elif rest == "bias":
161
+ if (block_prefix, slot) in dconv_conv_slots:
162
+ return [(f"{block_prefix}.layers.{slot}.conv.{rest}", value)]
163
+ else:
164
+ return [(f"{block_prefix}.layers.{slot}.{rest}", value)]
165
+ else: # scale
166
+ return [(f"{block_prefix}.layers.{slot}.{rest}", value)]
167
+
168
+ # --- 2b. LSTM weights/biases ---
169
+ m_lstm = re.match(r"lstm\.(weight|bias)_(ih|hh)_l(\d+)(_reverse)?$", rest)
170
+ if m_lstm:
171
+ wb, ih_hh, layer_idx, reverse = m_lstm.groups()
172
+ direction = "backward_lstms" if reverse else "forward_lstms"
173
+ if wb == "weight":
174
+ param = "Wx" if ih_hh == "ih" else "Wh"
175
+ return [(f"{block_prefix}.layers.{slot}.{direction}.{layer_idx}.{param}", value)]
176
+ else: # bias — both bias_ih and bias_hh map to same key; caller merges
177
+ return [(f"{block_prefix}.layers.{slot}.{direction}.{layer_idx}.bias", value)]
178
+
179
+ # --- 2c. LSTM linear ---
180
+ m_linear = re.match(r"linear\.(weight|bias)$", rest)
181
+ if m_linear:
182
+ param = m_linear.group(1)
183
+ return [(f"{block_prefix}.layers.{slot}.linear.{param}", value)]
184
+
185
+ # --- 2d. Attention sub-modules (LocalState) ---
186
+ m_attn = re.match(r"(content|key|query|proj|query_decay|query_freqs)\.(weight|bias)$", rest)
187
+ if m_attn:
188
+ attn_name, param = m_attn.groups()
189
+ # These are all Conv1d modules → add .conv. wrapper
190
+ return [(f"{block_prefix}.layers.{slot}.{attn_name}.conv.{param}", value)]
191
+
192
+ # --- 2e. Fallback for unknown compound keys ---
193
+ return [(f"{block_prefix}.layers.{slot}.{rest}", value)]
194
+
195
+ # =========================================================================
196
+ # Step 3: MultiheadAttention in_proj split (HTDemucs transformer)
197
+ # =========================================================================
198
+ m = re.match(r"(.+)\.(self_attn|cross_attn)\.in_proj_(weight|bias)$", key)
199
+ if m:
200
+ prefix, attn_type, param = m.group(1), m.group(2), m.group(3)
201
+ mlx_attn = "attn" if attn_type == "self_attn" else "cross_attn"
202
+ dim = value.shape[0] // 3
203
+ q, k_val, v = value[:dim], value[dim : 2 * dim], value[2 * dim :]
204
+ return [
205
+ (f"{prefix}.{mlx_attn}.query_proj.{param}", q),
206
+ (f"{prefix}.{mlx_attn}.key_proj.{param}", k_val),
207
+ (f"{prefix}.{mlx_attn}.value_proj.{param}", v),
208
+ ]
209
+
210
+ # self_attn.out_proj → attn.out_proj
211
+ m = re.match(r"(.+)\.self_attn\.out_proj\.(weight|bias)$", key)
212
+ if m:
213
+ prefix, param = m.group(1), m.group(2)
214
+ return [(f"{prefix}.attn.out_proj.{param}", value)]
215
+
216
+ # =========================================================================
217
+ # Step 4: norm_out wrapping → norm_out.gn
218
+ # =========================================================================
219
+ m = re.match(r"(.+)\.norm_out\.(weight|bias)$", key)
220
+ if m:
221
+ prefix, param = m.group(1), m.group(2)
222
+ return [(f"{prefix}.norm_out.gn.{param}", value)]
223
+
224
+ # =========================================================================
225
+ # Step 5: Bottleneck LSTM (Demucs v1/v2 and HDemucs)
226
+ # lstm.lstm.weight_ih_l0 → lstm.forward_lstms.0.Wx
227
+ # =========================================================================
228
+ m = re.match(r"(.+)\.lstm\.(weight|bias)_(ih|hh)_l(\d+)(_reverse)?$", key)
229
+ if m:
230
+ prefix = m.group(1)
231
+ wb = m.group(2)
232
+ ih_hh = m.group(3)
233
+ layer_idx = m.group(4)
234
+ reverse = m.group(5)
235
+ direction = "backward_lstms" if reverse else "forward_lstms"
236
+ if wb == "weight":
237
+ param = "Wx" if ih_hh == "ih" else "Wh"
238
+ return [(f"{prefix}.{direction}.{layer_idx}.{param}", value)]
239
+ else: # bias — merge handled by caller
240
+ return [(f"{prefix}.{direction}.{layer_idx}.bias", value)]
241
+
242
+ # =========================================================================
243
+ # Step 6: Conv/ConvTranspose/Rewrite named layers → add .conv. wrapper
244
+ # =========================================================================
245
+ parts = key.rsplit(".", 1)
246
+ if len(parts) == 2:
247
+ path, param = parts
248
+ path_parts = path.split(".")
249
+ last_name = path_parts[-1]
250
+ if last_name in CONV_LAYER_NAMES and param in ("weight", "bias"):
251
+ return [(f"{path}.conv.{param}", value)]
252
+
253
+ # =========================================================================
254
+ # Default: no change
255
+ # =========================================================================
256
+ return [(key, value)]
257
+
258
+
259
+ def convert_sub_model(model, prefix: str) -> dict[str, np.ndarray]:
260
+ """Convert a single sub-model's state dict to MLX-compatible numpy arrays."""
261
+ cls_name = type(model).__name__
262
+
263
+ # --- Pre-scan: identify ConvTranspose modules by type ---
264
+ conv_tr_paths = set()
265
+ for name, module in model.named_modules():
266
+ if isinstance(module, (torch.nn.ConvTranspose1d, torch.nn.ConvTranspose2d)):
267
+ conv_tr_paths.add(name)
268
+
269
+ # --- Collect state dict as numpy ---
270
+ state_items = []
271
+ for key, tensor in model.state_dict().items():
272
+ arr = tensor.detach().cpu().float().numpy()
273
+ state_items.append((key, arr))
274
+
275
+ # --- Pre-scan: identify DConv Conv slots (3D weights) ---
276
+ # Pattern: *.layers.{block}.{slot}.weight where value is 3D
277
+ # For Demucs v1/v2, apply Sequential insertion first so lookups match remap_key
278
+ dconv_conv_slots: set[tuple[str, str]] = set()
279
+ for key, arr in state_items:
280
+ scan_key = key
281
+ if cls_name == "Demucs":
282
+ m = re.match(r"(encoder|decoder)\.(\d+)\.(\d+)(\..*)?$", scan_key)
283
+ if m:
284
+ enc_dec, layer, slot, rest = m.groups()
285
+ rest = rest or ""
286
+ scan_key = f"{enc_dec}.{layer}.layers.{slot}{rest}"
287
+ m = re.match(r"(.+\.layers\.\d+)\.(\d+)\.weight$", scan_key)
288
+ if m and len(arr.shape) >= 2:
289
+ dconv_conv_slots.add((m.group(1), m.group(2)))
290
+
291
+ # --- Pre-scan: Demucs v1/v2 Sequential Conv slots ---
292
+ seq_conv_slots: set[tuple[str, str, str]] = set()
293
+ if cls_name == "Demucs":
294
+ for key, arr in state_items:
295
+ m = re.match(r"(encoder|decoder)\.(\d+)\.(\d+)\.weight$", key)
296
+ if m and len(arr.shape) >= 2:
297
+ seq_conv_slots.add((m.group(1), m.group(2), m.group(3)))
298
+
299
+ # --- Convert ---
300
+ weights: dict[str, np.ndarray] = {}
301
+ for key, arr in state_items:
302
+ # Determine if this belongs to a ConvTranspose module
303
+ is_conv_tr = any(key.startswith(p + ".") for p in conv_tr_paths)
304
+
305
+ # Transpose conv weights
306
+ arr = transpose_conv_weights(key, arr, is_conv_transpose=is_conv_tr)
307
+
308
+ # Remap key
309
+ remapped = remap_key(key, arr, cls_name, dconv_conv_slots, seq_conv_slots)
310
+ for new_key, new_val in remapped:
311
+ full_key = f"{prefix}{new_key}"
312
+ if full_key in weights:
313
+ # LSTM bias merge: bias_ih + bias_hh → bias (additive)
314
+ weights[full_key] = weights[full_key] + new_val
315
+ else:
316
+ weights[full_key] = new_val
317
+
318
+ return weights
319
+
320
+
321
+ def extract_kwargs(model) -> dict:
322
+ """Extract constructor kwargs from a model using _init_args_kwargs or inspection."""
323
+ if hasattr(model, "_init_args_kwargs"):
324
+ _, kwargs = model._init_args_kwargs
325
+ return {k: to_json_serializable(v) for k, v in kwargs.items()
326
+ if isinstance(v, (int, float, str, bool, list, tuple, type(None), Fraction))}
327
+
328
+ # Fallback: inspect __init__ signature and read matching attributes
329
+ sig = inspect.signature(type(model).__init__)
330
+ kwargs = {}
331
+ for name in sig.parameters:
332
+ if name == "self":
333
+ continue
334
+ if hasattr(model, name):
335
+ val = getattr(model, name)
336
+ kwargs[name] = to_json_serializable(val)
337
+ return kwargs
338
+
339
+
340
+ def export_model(model_name: str, out_dir: Path) -> bool:
341
+ """Export a single model (or bag) to safetensors + config JSON."""
342
+ from demucs.pretrained import get_model
343
+ from demucs.apply import BagOfModels
344
+
345
+ print(f"\n--- Exporting {model_name} ---")
346
+ try:
347
+ model = get_model(model_name)
348
+ except Exception as e:
349
+ print(f" Failed to load model: {e}")
350
+ return False
351
+
352
+ is_bag = isinstance(model, BagOfModels)
353
+
354
+ if is_bag:
355
+ sub_models = list(model.models)
356
+ num_models = len(sub_models)
357
+ bag_weights = model.weights.tolist() if hasattr(model.weights, "tolist") else list(model.weights)
358
+ else:
359
+ sub_models = [model]
360
+ num_models = 1
361
+ bag_weights = None
362
+
363
+ print(f" {'Bag of ' + str(num_models) + ' models' if is_bag else 'Single model'}")
364
+
365
+ # Collect all weights and metadata
366
+ all_weights: dict[str, np.ndarray] = {}
367
+ model_classes: list[str] = []
368
+ model_configs: list[dict] = []
369
+
370
+ for i, sub in enumerate(sub_models):
371
+ cls_name = type(sub).__name__
372
+ mlx_cls = CLASS_MAP.get(cls_name, cls_name)
373
+ model_classes.append(mlx_cls)
374
+ print(f" Model {i}: {cls_name} → {mlx_cls}")
375
+
376
+ prefix = f"model_{i}." if is_bag else ""
377
+ sub_weights = convert_sub_model(sub, prefix)
378
+ all_weights.update(sub_weights)
379
+
380
+ kwargs = extract_kwargs(sub)
381
+ model_configs.append({
382
+ "model_class": mlx_cls,
383
+ "kwargs": kwargs,
384
+ })
385
+
386
+ # Build config JSON
387
+ config: dict = {
388
+ "model_name": model_name,
389
+ "tensor_count": len(all_weights),
390
+ }
391
+
392
+ if is_bag:
393
+ config["model_class"] = "BagOfModelsMLX"
394
+ config["num_models"] = num_models
395
+ config["weights"] = bag_weights
396
+ config["sub_model_classes"] = model_classes
397
+
398
+ # If all sub-models are the same class, set sub_model_class for compat
399
+ unique = set(model_classes)
400
+ if len(unique) == 1:
401
+ config["sub_model_class"] = unique.pop()
402
+
403
+ config["model_configs"] = model_configs
404
+
405
+ # Also put kwargs at top level for single-model bags (common case)
406
+ if num_models == 1:
407
+ config["kwargs"] = model_configs[0]["kwargs"]
408
+ else:
409
+ config["model_class"] = model_classes[0]
410
+ config["kwargs"] = model_configs[0]["kwargs"]
411
+
412
+ # Save files
413
+ model_dir = out_dir / model_name
414
+ model_dir.mkdir(parents=True, exist_ok=True)
415
+
416
+ safetensors_path = model_dir / f"{model_name}.safetensors"
417
+ config_path = model_dir / f"{model_name}_config.json"
418
+
419
+ # Save safetensors (prefer safetensors library, fallback to mlx)
420
+ try:
421
+ from safetensors.numpy import save_file
422
+ save_file(all_weights, str(safetensors_path))
423
+ except ImportError:
424
+ import mlx.core as mx
425
+ mlx_weights = {k: mx.array(v) for k, v in all_weights.items()}
426
+ mx.save_safetensors(str(safetensors_path), mlx_weights)
427
+
428
+ with config_path.open("w") as f:
429
+ json.dump(config, f, indent=2, default=str)
430
+
431
+ size_mb = safetensors_path.stat().st_size / (1024 * 1024)
432
+ print(f" Wrote {safetensors_path} ({len(all_weights)} tensors, {size_mb:.0f} MB)")
433
+ print(f" Wrote {config_path}")
434
+ return True
435
+
436
+
437
+ def main():
438
+ ap = argparse.ArgumentParser(
439
+ description="Export Demucs PyTorch models to safetensors for Swift MLX"
440
+ )
441
+ ap.add_argument(
442
+ "--models",
443
+ nargs="*",
444
+ default=None,
445
+ help=f"Models to export (default: all). Choices: {', '.join(ALL_MODELS)}",
446
+ )
447
+ ap.add_argument(
448
+ "--out-dir",
449
+ default="./Models",
450
+ help="Output root directory (files go into <out-dir>/<model_name>/)",
451
+ )
452
+ args = ap.parse_args()
453
+
454
+ models = args.models or ALL_MODELS
455
+ out_dir = Path(args.out_dir).resolve()
456
+
457
+ exported = 0
458
+ failed = 0
459
+
460
+ for name in models:
461
+ if export_model(name, out_dir):
462
+ exported += 1
463
+ else:
464
+ failed += 1
465
+
466
+ print(f"\n=== Done: {exported} exported, {failed} failed ===")
467
+ if failed:
468
+ sys.exit(1)
469
+
470
+
471
+ if __name__ == "__main__":
472
+ main()
hdemucs_mmi.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:296eba7d60dd1f1cd8c623ffb6d5712e3781e6fb0117f77d5966513b913a4568
3
+ size 167283844
hdemucs_mmi_config.json ADDED
@@ -0,0 +1,60 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_name": "hdemucs_mmi",
3
+ "model_class": "BagOfModelsMLX",
4
+ "sub_model_class": "HDemucsMLX",
5
+ "num_models": 1,
6
+ "weights": [
7
+ [
8
+ 1.0,
9
+ 1.0,
10
+ 1.0,
11
+ 1.0
12
+ ]
13
+ ],
14
+ "args": [],
15
+ "kwargs": {
16
+ "sources": [
17
+ "drums",
18
+ "bass",
19
+ "other",
20
+ "vocals"
21
+ ],
22
+ "audio_channels": 2,
23
+ "samplerate": 44100,
24
+ "segment": 44,
25
+ "channels": 48,
26
+ "channels_time": null,
27
+ "growth": 2,
28
+ "nfft": 4096,
29
+ "wiener_iters": 0,
30
+ "end_iters": 0,
31
+ "wiener_residual": false,
32
+ "cac": true,
33
+ "depth": 6,
34
+ "rewrite": true,
35
+ "hybrid": true,
36
+ "hybrid_old": false,
37
+ "multi_freqs": [],
38
+ "multi_freqs_depth": 3,
39
+ "freq_emb": 0.2,
40
+ "emb_scale": 10,
41
+ "emb_smooth": true,
42
+ "kernel_size": 8,
43
+ "stride": 4,
44
+ "time_stride": 2,
45
+ "context": 1,
46
+ "context_enc": 0,
47
+ "norm_starts": 4,
48
+ "norm_groups": 4,
49
+ "dconv_mode": 1,
50
+ "dconv_depth": 2,
51
+ "dconv_comp": 4,
52
+ "dconv_attn": 4,
53
+ "dconv_lstm": 4,
54
+ "dconv_init": 0.001,
55
+ "rescale": 0.1
56
+ },
57
+ "mlx_version": "0.30.3",
58
+ "tensor_count": 379,
59
+ "dtype": "float16"
60
+ }
htdemucs.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fc770cfcd06cceac138f9586e74cbdc65f26dadd79c0cc6658ff6b1159bf3f92
3
+ size 84036122
htdemucs_6s.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:91bf1175c95f6c173d34abb65c2dc1256a0847def4529c7fd20abdea165a6299
3
+ size 54896338
htdemucs_6s_config.json ADDED
@@ -0,0 +1,95 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_name": "htdemucs_6s",
3
+ "model_class": "BagOfModelsMLX",
4
+ "sub_model_class": "HTDemucsMLX",
5
+ "num_models": 1,
6
+ "weights": [
7
+ [
8
+ 1.0,
9
+ 1.0,
10
+ 1.0,
11
+ 1.0,
12
+ 1.0,
13
+ 1.0
14
+ ]
15
+ ],
16
+ "args": [],
17
+ "kwargs": {
18
+ "sources": [
19
+ "drums",
20
+ "bass",
21
+ "other",
22
+ "vocals",
23
+ "guitar",
24
+ "piano"
25
+ ],
26
+ "audio_channels": 2,
27
+ "samplerate": 44100,
28
+ "segment": "39/5",
29
+ "channels": 48,
30
+ "channels_time": null,
31
+ "growth": 2,
32
+ "nfft": 4096,
33
+ "wiener_iters": 0,
34
+ "end_iters": 0,
35
+ "wiener_residual": false,
36
+ "cac": true,
37
+ "depth": 4,
38
+ "rewrite": true,
39
+ "multi_freqs": [],
40
+ "multi_freqs_depth": 3,
41
+ "freq_emb": 0.2,
42
+ "emb_scale": 10,
43
+ "emb_smooth": true,
44
+ "kernel_size": 8,
45
+ "stride": 4,
46
+ "time_stride": 2,
47
+ "context": 1,
48
+ "context_enc": 0,
49
+ "norm_starts": 4,
50
+ "norm_groups": 4,
51
+ "dconv_mode": 3,
52
+ "dconv_depth": 2,
53
+ "dconv_comp": 8,
54
+ "dconv_init": 0.001,
55
+ "bottom_channels": 0,
56
+ "t_layers": 5,
57
+ "t_hidden_scale": 4.0,
58
+ "t_heads": 8,
59
+ "t_dropout": 0.02,
60
+ "t_layer_scale": true,
61
+ "t_gelu": true,
62
+ "t_emb": "sin",
63
+ "t_max_positions": 10000,
64
+ "t_max_period": 10000.0,
65
+ "t_weight_pos_embed": 1.0,
66
+ "t_cape_mean_normalize": true,
67
+ "t_cape_augment": true,
68
+ "t_cape_glob_loc_scale": [
69
+ 5000.0,
70
+ 1.0,
71
+ 1.4
72
+ ],
73
+ "t_sin_random_shift": 0,
74
+ "t_norm_in": true,
75
+ "t_norm_in_group": false,
76
+ "t_group_norm": false,
77
+ "t_norm_first": true,
78
+ "t_norm_out": true,
79
+ "t_weight_decay": 0.0,
80
+ "t_lr": null,
81
+ "t_sparse_self_attn": false,
82
+ "t_sparse_cross_attn": false,
83
+ "t_mask_type": "diag",
84
+ "t_mask_random_seed": 42,
85
+ "t_sparse_attn_window": 400,
86
+ "t_global_window": 100,
87
+ "t_sparsity": 0.95,
88
+ "t_auto_sparsity": false,
89
+ "t_cross_first": false,
90
+ "rescale": 0.1
91
+ },
92
+ "mlx_version": "0.30.3",
93
+ "tensor_count": 565,
94
+ "dtype": "float16"
95
+ }
htdemucs_config.json ADDED
@@ -0,0 +1,91 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_name": "htdemucs",
3
+ "model_class": "BagOfModelsMLX",
4
+ "sub_model_class": "HTDemucsMLX",
5
+ "num_models": 1,
6
+ "weights": [
7
+ [
8
+ 1.0,
9
+ 1.0,
10
+ 1.0,
11
+ 1.0
12
+ ]
13
+ ],
14
+ "args": [],
15
+ "kwargs": {
16
+ "sources": [
17
+ "drums",
18
+ "bass",
19
+ "other",
20
+ "vocals"
21
+ ],
22
+ "audio_channels": 2,
23
+ "samplerate": 44100,
24
+ "segment": "39/5",
25
+ "channels": 48,
26
+ "channels_time": null,
27
+ "growth": 2,
28
+ "nfft": 4096,
29
+ "wiener_iters": 0,
30
+ "end_iters": 0,
31
+ "wiener_residual": false,
32
+ "cac": true,
33
+ "depth": 4,
34
+ "rewrite": true,
35
+ "multi_freqs": [],
36
+ "multi_freqs_depth": 3,
37
+ "freq_emb": 0.2,
38
+ "emb_scale": 10,
39
+ "emb_smooth": true,
40
+ "kernel_size": 8,
41
+ "stride": 4,
42
+ "time_stride": 2,
43
+ "context": 1,
44
+ "context_enc": 0,
45
+ "norm_starts": 4,
46
+ "norm_groups": 4,
47
+ "dconv_mode": 3,
48
+ "dconv_depth": 2,
49
+ "dconv_comp": 8,
50
+ "dconv_init": 0.001,
51
+ "bottom_channels": 512,
52
+ "t_layers": 5,
53
+ "t_hidden_scale": 4.0,
54
+ "t_heads": 8,
55
+ "t_dropout": 0.02,
56
+ "t_layer_scale": true,
57
+ "t_gelu": true,
58
+ "t_emb": "sin",
59
+ "t_max_positions": 10000,
60
+ "t_max_period": 10000.0,
61
+ "t_weight_pos_embed": 1.0,
62
+ "t_cape_mean_normalize": true,
63
+ "t_cape_augment": true,
64
+ "t_cape_glob_loc_scale": [
65
+ 5000.0,
66
+ 1.0,
67
+ 1.4
68
+ ],
69
+ "t_sin_random_shift": 0,
70
+ "t_norm_in": true,
71
+ "t_norm_in_group": false,
72
+ "t_group_norm": false,
73
+ "t_norm_first": true,
74
+ "t_norm_out": true,
75
+ "t_weight_decay": 0.0,
76
+ "t_lr": null,
77
+ "t_sparse_self_attn": false,
78
+ "t_sparse_cross_attn": false,
79
+ "t_mask_type": "diag",
80
+ "t_mask_random_seed": 42,
81
+ "t_sparse_attn_window": 400,
82
+ "t_global_window": 100,
83
+ "t_sparsity": 0.95,
84
+ "t_auto_sparsity": false,
85
+ "t_cross_first": false,
86
+ "rescale": 0.1
87
+ },
88
+ "mlx_version": "0.30.3",
89
+ "tensor_count": 573,
90
+ "dtype": "float16"
91
+ }
htdemucs_ft.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6eb655cf72902869d3118fb3c4b243d6e6de73225472433af6fe54f3a5575a89
3
+ size 336148303
htdemucs_ft_config.json ADDED
@@ -0,0 +1,109 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_name": "htdemucs_ft",
3
+ "model_class": "BagOfModelsMLX",
4
+ "sub_model_class": "HTDemucsMLX",
5
+ "num_models": 4,
6
+ "weights": [
7
+ [
8
+ 1.0,
9
+ 0.0,
10
+ 0.0,
11
+ 0.0
12
+ ],
13
+ [
14
+ 0.0,
15
+ 1.0,
16
+ 0.0,
17
+ 0.0
18
+ ],
19
+ [
20
+ 0.0,
21
+ 0.0,
22
+ 1.0,
23
+ 0.0
24
+ ],
25
+ [
26
+ 0.0,
27
+ 0.0,
28
+ 0.0,
29
+ 1.0
30
+ ]
31
+ ],
32
+ "args": [],
33
+ "kwargs": {
34
+ "sources": [
35
+ "drums",
36
+ "bass",
37
+ "other",
38
+ "vocals"
39
+ ],
40
+ "audio_channels": 2,
41
+ "samplerate": 44100,
42
+ "segment": "39/5",
43
+ "channels": 48,
44
+ "channels_time": null,
45
+ "growth": 2,
46
+ "nfft": 4096,
47
+ "wiener_iters": 0,
48
+ "end_iters": 0,
49
+ "wiener_residual": false,
50
+ "cac": true,
51
+ "depth": 4,
52
+ "rewrite": true,
53
+ "multi_freqs": [],
54
+ "multi_freqs_depth": 3,
55
+ "freq_emb": 0.2,
56
+ "emb_scale": 10,
57
+ "emb_smooth": true,
58
+ "kernel_size": 8,
59
+ "stride": 4,
60
+ "time_stride": 2,
61
+ "context": 1,
62
+ "context_enc": 0,
63
+ "norm_starts": 4,
64
+ "norm_groups": 4,
65
+ "dconv_mode": 3,
66
+ "dconv_depth": 2,
67
+ "dconv_comp": 8,
68
+ "dconv_init": 0.001,
69
+ "bottom_channels": 512,
70
+ "t_layers": 5,
71
+ "t_hidden_scale": 4.0,
72
+ "t_heads": 8,
73
+ "t_dropout": 0.02,
74
+ "t_layer_scale": true,
75
+ "t_gelu": true,
76
+ "t_emb": "sin",
77
+ "t_max_positions": 10000,
78
+ "t_max_period": 10000.0,
79
+ "t_weight_pos_embed": 1.0,
80
+ "t_cape_mean_normalize": true,
81
+ "t_cape_augment": true,
82
+ "t_cape_glob_loc_scale": [
83
+ 5000.0,
84
+ 1.0,
85
+ 1.4
86
+ ],
87
+ "t_sin_random_shift": 0,
88
+ "t_norm_in": true,
89
+ "t_norm_in_group": false,
90
+ "t_group_norm": false,
91
+ "t_norm_first": true,
92
+ "t_norm_out": true,
93
+ "t_weight_decay": 0.05,
94
+ "t_lr": null,
95
+ "t_sparse_self_attn": false,
96
+ "t_sparse_cross_attn": false,
97
+ "t_mask_type": "diag",
98
+ "t_mask_random_seed": 42,
99
+ "t_sparse_attn_window": 400,
100
+ "t_global_window": 100,
101
+ "t_sparsity": 0.95,
102
+ "t_auto_sparsity": false,
103
+ "t_cross_first": false,
104
+ "rescale": 0.1
105
+ },
106
+ "mlx_version": "0.30.3",
107
+ "tensor_count": 2292,
108
+ "dtype": "float16"
109
+ }
mdx.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6fe9bf1d699ff8a90236c7cec124419a01dcaf79f2d6333d6c79bc048c10a33e
3
+ size 690908505
mdx_config.json ADDED
@@ -0,0 +1,245 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_name": "mdx",
3
+ "model_class": "BagOfModelsMLX",
4
+ "sub_model_class": "DemucsMLX",
5
+ "num_models": 4,
6
+ "weights": [
7
+ [
8
+ 1.0,
9
+ 1.0,
10
+ 0.0,
11
+ 0.0
12
+ ],
13
+ [
14
+ 0.0,
15
+ 1.0,
16
+ 0.0,
17
+ 0.0
18
+ ],
19
+ [
20
+ 1.0,
21
+ 0.0,
22
+ 1.0,
23
+ 1.0
24
+ ],
25
+ [
26
+ 1.0,
27
+ 0.0,
28
+ 1.0,
29
+ 1.0
30
+ ]
31
+ ],
32
+ "args": [],
33
+ "kwargs": {
34
+ "sources": [
35
+ "drums",
36
+ "bass",
37
+ "other",
38
+ "vocals"
39
+ ],
40
+ "audio_channels": 2,
41
+ "samplerate": 44100,
42
+ "segment": 44,
43
+ "channels": 64,
44
+ "growth": 2,
45
+ "depth": 6,
46
+ "rewrite": false,
47
+ "lstm_layers": 0,
48
+ "kernel_size": 8,
49
+ "stride": 4,
50
+ "context": 1,
51
+ "gelu": true,
52
+ "glu": true,
53
+ "norm_groups": 4,
54
+ "norm_starts": 4,
55
+ "dconv_depth": 2,
56
+ "dconv_mode": 1,
57
+ "dconv_comp": 4,
58
+ "dconv_attn": 4,
59
+ "dconv_lstm": 4,
60
+ "dconv_init": 0.0001,
61
+ "resample": true,
62
+ "normalize": true,
63
+ "rescale": 0.1,
64
+ "gelu_act": true,
65
+ "glu_act": true
66
+ },
67
+ "mlx_version": "0.30.3",
68
+ "tensor_count": 1298,
69
+ "sub_model_classes": [
70
+ "DemucsMLX",
71
+ "DemucsMLX",
72
+ "HDemucsMLX",
73
+ "HDemucsMLX"
74
+ ],
75
+ "model_configs": [
76
+ {
77
+ "model_class": "DemucsMLX",
78
+ "kwargs": {
79
+ "sources": [
80
+ "drums",
81
+ "bass",
82
+ "other",
83
+ "vocals"
84
+ ],
85
+ "audio_channels": 2,
86
+ "samplerate": 44100,
87
+ "segment": 44,
88
+ "channels": 64,
89
+ "growth": 2,
90
+ "depth": 6,
91
+ "rewrite": false,
92
+ "lstm_layers": 0,
93
+ "kernel_size": 8,
94
+ "stride": 4,
95
+ "context": 1,
96
+ "gelu": true,
97
+ "glu": true,
98
+ "norm_groups": 4,
99
+ "norm_starts": 4,
100
+ "dconv_depth": 2,
101
+ "dconv_mode": 1,
102
+ "dconv_comp": 4,
103
+ "dconv_attn": 4,
104
+ "dconv_lstm": 4,
105
+ "dconv_init": 0.0001,
106
+ "resample": true,
107
+ "normalize": true,
108
+ "rescale": 0.1,
109
+ "gelu_act": true,
110
+ "glu_act": true
111
+ }
112
+ },
113
+ {
114
+ "model_class": "DemucsMLX",
115
+ "kwargs": {
116
+ "sources": [
117
+ "drums",
118
+ "bass",
119
+ "other",
120
+ "vocals"
121
+ ],
122
+ "audio_channels": 2,
123
+ "samplerate": 44100,
124
+ "segment": 44,
125
+ "channels": 64,
126
+ "growth": 2,
127
+ "depth": 6,
128
+ "rewrite": false,
129
+ "lstm_layers": 0,
130
+ "kernel_size": 8,
131
+ "stride": 4,
132
+ "context": 1,
133
+ "gelu": true,
134
+ "glu": true,
135
+ "norm_groups": 4,
136
+ "norm_starts": 4,
137
+ "dconv_depth": 2,
138
+ "dconv_mode": 1,
139
+ "dconv_comp": 4,
140
+ "dconv_attn": 4,
141
+ "dconv_lstm": 4,
142
+ "dconv_init": 0.0001,
143
+ "resample": true,
144
+ "normalize": true,
145
+ "rescale": 0.1,
146
+ "gelu_act": true,
147
+ "glu_act": true
148
+ }
149
+ },
150
+ {
151
+ "model_class": "HDemucsMLX",
152
+ "kwargs": {
153
+ "sources": [
154
+ "drums",
155
+ "bass",
156
+ "other",
157
+ "vocals"
158
+ ],
159
+ "audio_channels": 2,
160
+ "samplerate": 44100,
161
+ "segment": 44,
162
+ "channels": 48,
163
+ "channels_time": null,
164
+ "growth": 2,
165
+ "nfft": 4096,
166
+ "wiener_iters": 0,
167
+ "end_iters": 0,
168
+ "wiener_residual": false,
169
+ "cac": false,
170
+ "depth": 6,
171
+ "rewrite": true,
172
+ "hybrid": true,
173
+ "hybrid_old": true,
174
+ "multi_freqs": [],
175
+ "multi_freqs_depth": 3,
176
+ "freq_emb": 0.2,
177
+ "emb_scale": 10,
178
+ "emb_smooth": true,
179
+ "kernel_size": 8,
180
+ "stride": 4,
181
+ "time_stride": 2,
182
+ "context": 1,
183
+ "context_enc": 0,
184
+ "norm_starts": 999,
185
+ "norm_groups": 4,
186
+ "dconv_mode": 1,
187
+ "dconv_depth": 2,
188
+ "dconv_comp": 4,
189
+ "dconv_attn": 4,
190
+ "dconv_lstm": 4,
191
+ "dconv_init": 0.0001,
192
+ "rescale": 0.1
193
+ }
194
+ },
195
+ {
196
+ "model_class": "HDemucsMLX",
197
+ "kwargs": {
198
+ "sources": [
199
+ "drums",
200
+ "bass",
201
+ "other",
202
+ "vocals"
203
+ ],
204
+ "audio_channels": 2,
205
+ "samplerate": 44100,
206
+ "segment": 44,
207
+ "channels": 48,
208
+ "channels_time": null,
209
+ "growth": 2,
210
+ "nfft": 4096,
211
+ "wiener_iters": 0,
212
+ "end_iters": 0,
213
+ "wiener_residual": false,
214
+ "cac": true,
215
+ "depth": 6,
216
+ "rewrite": true,
217
+ "hybrid": true,
218
+ "hybrid_old": false,
219
+ "multi_freqs": [
220
+ 0.1,
221
+ 0.3
222
+ ],
223
+ "multi_freqs_depth": 2,
224
+ "freq_emb": 0.2,
225
+ "emb_scale": 10,
226
+ "emb_smooth": true,
227
+ "kernel_size": 8,
228
+ "stride": 4,
229
+ "time_stride": 2,
230
+ "context": 1,
231
+ "context_enc": 0,
232
+ "norm_starts": 999,
233
+ "norm_groups": 4,
234
+ "dconv_mode": 1,
235
+ "dconv_depth": 2,
236
+ "dconv_comp": 4,
237
+ "dconv_attn": 4,
238
+ "dconv_lstm": 4,
239
+ "dconv_init": 0.0001,
240
+ "rescale": 0.1
241
+ }
242
+ }
243
+ ],
244
+ "dtype": "float16"
245
+ }
mdx_extra.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:40c4a84c16b27ec0d9fdbd89d12dbac147fd5e3705abd0d4fcbb0f95d2a8463e
3
+ size 669121267
mdx_extra_config.json ADDED
@@ -0,0 +1,260 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_name": "mdx_extra",
3
+ "model_class": "BagOfModelsMLX",
4
+ "sub_model_class": "HDemucsMLX",
5
+ "num_models": 4,
6
+ "weights": [
7
+ [
8
+ 1.0,
9
+ 1.0,
10
+ 1.0,
11
+ 1.0
12
+ ],
13
+ [
14
+ 1.0,
15
+ 1.0,
16
+ 1.0,
17
+ 1.0
18
+ ],
19
+ [
20
+ 1.0,
21
+ 1.0,
22
+ 1.0,
23
+ 1.0
24
+ ],
25
+ [
26
+ 1.0,
27
+ 1.0,
28
+ 1.0,
29
+ 1.0
30
+ ]
31
+ ],
32
+ "args": [],
33
+ "kwargs": {
34
+ "sources": [
35
+ "drums",
36
+ "bass",
37
+ "other",
38
+ "vocals"
39
+ ],
40
+ "audio_channels": 2,
41
+ "samplerate": 44100,
42
+ "segment": 44,
43
+ "channels": 48,
44
+ "channels_time": null,
45
+ "growth": 2,
46
+ "nfft": 4096,
47
+ "wiener_iters": 0,
48
+ "end_iters": 0,
49
+ "wiener_residual": false,
50
+ "cac": true,
51
+ "depth": 6,
52
+ "rewrite": true,
53
+ "hybrid": true,
54
+ "hybrid_old": false,
55
+ "multi_freqs": [],
56
+ "multi_freqs_depth": 3,
57
+ "freq_emb": 0.2,
58
+ "emb_scale": 10,
59
+ "emb_smooth": true,
60
+ "kernel_size": 8,
61
+ "stride": 4,
62
+ "time_stride": 2,
63
+ "context": 1,
64
+ "context_enc": 0,
65
+ "norm_starts": 4,
66
+ "norm_groups": 4,
67
+ "dconv_mode": 1,
68
+ "dconv_depth": 2,
69
+ "dconv_comp": 4,
70
+ "dconv_attn": 4,
71
+ "dconv_lstm": 4,
72
+ "dconv_init": 0.0001,
73
+ "rescale": 0.1
74
+ },
75
+ "mlx_version": "0.30.3",
76
+ "tensor_count": 1516,
77
+ "model_configs": [
78
+ {
79
+ "model_class": "HTDemucsMLX",
80
+ "kwargs": {
81
+ "sources": [
82
+ "drums",
83
+ "bass",
84
+ "other",
85
+ "vocals"
86
+ ],
87
+ "audio_channels": 2,
88
+ "samplerate": 44100,
89
+ "segment": 44,
90
+ "channels": 48,
91
+ "channels_time": null,
92
+ "growth": 2,
93
+ "nfft": 4096,
94
+ "wiener_iters": 0,
95
+ "end_iters": 0,
96
+ "wiener_residual": false,
97
+ "cac": true,
98
+ "depth": 6,
99
+ "rewrite": true,
100
+ "hybrid": true,
101
+ "hybrid_old": false,
102
+ "multi_freqs": [],
103
+ "multi_freqs_depth": 3,
104
+ "freq_emb": 0.2,
105
+ "emb_scale": 10,
106
+ "emb_smooth": true,
107
+ "kernel_size": 8,
108
+ "stride": 4,
109
+ "time_stride": 2,
110
+ "context": 1,
111
+ "context_enc": 0,
112
+ "norm_starts": 4,
113
+ "norm_groups": 4,
114
+ "dconv_mode": 1,
115
+ "dconv_depth": 2,
116
+ "dconv_comp": 4,
117
+ "dconv_attn": 4,
118
+ "dconv_lstm": 4,
119
+ "dconv_init": 0.0001,
120
+ "rescale": 0.1
121
+ }
122
+ },
123
+ {
124
+ "model_class": "HTDemucsMLX",
125
+ "kwargs": {
126
+ "sources": [
127
+ "drums",
128
+ "bass",
129
+ "other",
130
+ "vocals"
131
+ ],
132
+ "audio_channels": 2,
133
+ "samplerate": 44100,
134
+ "segment": 44,
135
+ "channels": 48,
136
+ "channels_time": null,
137
+ "growth": 2,
138
+ "nfft": 4096,
139
+ "wiener_iters": 0,
140
+ "end_iters": 0,
141
+ "wiener_residual": false,
142
+ "cac": false,
143
+ "depth": 6,
144
+ "rewrite": true,
145
+ "hybrid": true,
146
+ "hybrid_old": true,
147
+ "multi_freqs": [],
148
+ "multi_freqs_depth": 3,
149
+ "freq_emb": 0.2,
150
+ "emb_scale": 10,
151
+ "emb_smooth": true,
152
+ "kernel_size": 8,
153
+ "stride": 4,
154
+ "time_stride": 2,
155
+ "context": 1,
156
+ "context_enc": 0,
157
+ "norm_starts": 4,
158
+ "norm_groups": 4,
159
+ "dconv_mode": 1,
160
+ "dconv_depth": 2,
161
+ "dconv_comp": 4,
162
+ "dconv_attn": 4,
163
+ "dconv_lstm": 4,
164
+ "dconv_init": 0.0001,
165
+ "rescale": 0.1
166
+ }
167
+ },
168
+ {
169
+ "model_class": "HTDemucsMLX",
170
+ "kwargs": {
171
+ "sources": [
172
+ "drums",
173
+ "bass",
174
+ "other",
175
+ "vocals"
176
+ ],
177
+ "audio_channels": 2,
178
+ "samplerate": 44100,
179
+ "segment": 44,
180
+ "channels": 48,
181
+ "channels_time": null,
182
+ "growth": 2,
183
+ "nfft": 4096,
184
+ "wiener_iters": 0,
185
+ "end_iters": 0,
186
+ "wiener_residual": false,
187
+ "cac": false,
188
+ "depth": 6,
189
+ "rewrite": true,
190
+ "hybrid": true,
191
+ "hybrid_old": false,
192
+ "multi_freqs": [],
193
+ "multi_freqs_depth": 3,
194
+ "freq_emb": 0.2,
195
+ "emb_scale": 10,
196
+ "emb_smooth": true,
197
+ "kernel_size": 8,
198
+ "stride": 4,
199
+ "time_stride": 2,
200
+ "context": 1,
201
+ "context_enc": 0,
202
+ "norm_starts": 4,
203
+ "norm_groups": 4,
204
+ "dconv_mode": 1,
205
+ "dconv_depth": 2,
206
+ "dconv_comp": 4,
207
+ "dconv_attn": 4,
208
+ "dconv_lstm": 4,
209
+ "dconv_init": 0.0001,
210
+ "rescale": 0.1
211
+ }
212
+ },
213
+ {
214
+ "model_class": "HTDemucsMLX",
215
+ "kwargs": {
216
+ "sources": [
217
+ "drums",
218
+ "bass",
219
+ "other",
220
+ "vocals"
221
+ ],
222
+ "audio_channels": 2,
223
+ "samplerate": 44100,
224
+ "segment": 44,
225
+ "channels": 48,
226
+ "channels_time": null,
227
+ "growth": 2,
228
+ "nfft": 4096,
229
+ "wiener_iters": 0,
230
+ "end_iters": 0,
231
+ "wiener_residual": false,
232
+ "cac": true,
233
+ "depth": 6,
234
+ "rewrite": true,
235
+ "hybrid": true,
236
+ "hybrid_old": false,
237
+ "multi_freqs": [],
238
+ "multi_freqs_depth": 3,
239
+ "freq_emb": 0.2,
240
+ "emb_scale": 10,
241
+ "emb_smooth": true,
242
+ "kernel_size": 8,
243
+ "stride": 4,
244
+ "time_stride": 2,
245
+ "context": 1,
246
+ "context_enc": 0,
247
+ "norm_starts": 4,
248
+ "norm_groups": 4,
249
+ "dconv_mode": 1,
250
+ "dconv_depth": 2,
251
+ "dconv_comp": 4,
252
+ "dconv_attn": 4,
253
+ "dconv_lstm": 4,
254
+ "dconv_init": 0.0001,
255
+ "rescale": 0.1
256
+ }
257
+ }
258
+ ],
259
+ "dtype": "float16"
260
+ }
mdx_extra_q.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a9c0ef032296244e18a478812663bcd5d6a513031656623c0b8ddc19ef4ad827
3
+ size 669121267
mdx_extra_q_config.json ADDED
@@ -0,0 +1,260 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_name": "mdx_extra_q",
3
+ "model_class": "BagOfModelsMLX",
4
+ "sub_model_class": "HDemucsMLX",
5
+ "num_models": 4,
6
+ "weights": [
7
+ [
8
+ 1.0,
9
+ 1.0,
10
+ 1.0,
11
+ 1.0
12
+ ],
13
+ [
14
+ 1.0,
15
+ 1.0,
16
+ 1.0,
17
+ 1.0
18
+ ],
19
+ [
20
+ 1.0,
21
+ 1.0,
22
+ 1.0,
23
+ 1.0
24
+ ],
25
+ [
26
+ 1.0,
27
+ 1.0,
28
+ 1.0,
29
+ 1.0
30
+ ]
31
+ ],
32
+ "args": [],
33
+ "kwargs": {
34
+ "sources": [
35
+ "drums",
36
+ "bass",
37
+ "other",
38
+ "vocals"
39
+ ],
40
+ "audio_channels": 2,
41
+ "samplerate": 44100,
42
+ "segment": 44,
43
+ "channels": 48,
44
+ "channels_time": null,
45
+ "growth": 2,
46
+ "nfft": 4096,
47
+ "wiener_iters": 0,
48
+ "end_iters": 0,
49
+ "wiener_residual": false,
50
+ "cac": true,
51
+ "depth": 6,
52
+ "rewrite": true,
53
+ "hybrid": true,
54
+ "hybrid_old": false,
55
+ "multi_freqs": [],
56
+ "multi_freqs_depth": 3,
57
+ "freq_emb": 0.2,
58
+ "emb_scale": 10,
59
+ "emb_smooth": true,
60
+ "kernel_size": 8,
61
+ "stride": 4,
62
+ "time_stride": 2,
63
+ "context": 1,
64
+ "context_enc": 0,
65
+ "norm_starts": 4,
66
+ "norm_groups": 4,
67
+ "dconv_mode": 1,
68
+ "dconv_depth": 2,
69
+ "dconv_comp": 4,
70
+ "dconv_attn": 4,
71
+ "dconv_lstm": 4,
72
+ "dconv_init": 0.001,
73
+ "rescale": 0.1
74
+ },
75
+ "mlx_version": "0.30.3",
76
+ "tensor_count": 1516,
77
+ "model_configs": [
78
+ {
79
+ "model_class": "HTDemucsMLX",
80
+ "kwargs": {
81
+ "sources": [
82
+ "drums",
83
+ "bass",
84
+ "other",
85
+ "vocals"
86
+ ],
87
+ "audio_channels": 2,
88
+ "samplerate": 44100,
89
+ "segment": 44,
90
+ "channels": 48,
91
+ "channels_time": null,
92
+ "growth": 2,
93
+ "nfft": 4096,
94
+ "wiener_iters": 0,
95
+ "end_iters": 0,
96
+ "wiener_residual": false,
97
+ "cac": true,
98
+ "depth": 6,
99
+ "rewrite": true,
100
+ "hybrid": true,
101
+ "hybrid_old": false,
102
+ "multi_freqs": [],
103
+ "multi_freqs_depth": 3,
104
+ "freq_emb": 0.2,
105
+ "emb_scale": 10,
106
+ "emb_smooth": true,
107
+ "kernel_size": 8,
108
+ "stride": 4,
109
+ "time_stride": 2,
110
+ "context": 1,
111
+ "context_enc": 0,
112
+ "norm_starts": 4,
113
+ "norm_groups": 4,
114
+ "dconv_mode": 1,
115
+ "dconv_depth": 2,
116
+ "dconv_comp": 4,
117
+ "dconv_attn": 4,
118
+ "dconv_lstm": 4,
119
+ "dconv_init": 0.001,
120
+ "rescale": 0.1
121
+ }
122
+ },
123
+ {
124
+ "model_class": "HTDemucsMLX",
125
+ "kwargs": {
126
+ "sources": [
127
+ "drums",
128
+ "bass",
129
+ "other",
130
+ "vocals"
131
+ ],
132
+ "audio_channels": 2,
133
+ "samplerate": 44100,
134
+ "segment": 44,
135
+ "channels": 48,
136
+ "channels_time": null,
137
+ "growth": 2,
138
+ "nfft": 4096,
139
+ "wiener_iters": 0,
140
+ "end_iters": 0,
141
+ "wiener_residual": false,
142
+ "cac": false,
143
+ "depth": 6,
144
+ "rewrite": true,
145
+ "hybrid": true,
146
+ "hybrid_old": true,
147
+ "multi_freqs": [],
148
+ "multi_freqs_depth": 3,
149
+ "freq_emb": 0.2,
150
+ "emb_scale": 10,
151
+ "emb_smooth": true,
152
+ "kernel_size": 8,
153
+ "stride": 4,
154
+ "time_stride": 2,
155
+ "context": 1,
156
+ "context_enc": 0,
157
+ "norm_starts": 4,
158
+ "norm_groups": 4,
159
+ "dconv_mode": 1,
160
+ "dconv_depth": 2,
161
+ "dconv_comp": 4,
162
+ "dconv_attn": 4,
163
+ "dconv_lstm": 4,
164
+ "dconv_init": 0.001,
165
+ "rescale": 0.1
166
+ }
167
+ },
168
+ {
169
+ "model_class": "HTDemucsMLX",
170
+ "kwargs": {
171
+ "sources": [
172
+ "drums",
173
+ "bass",
174
+ "other",
175
+ "vocals"
176
+ ],
177
+ "audio_channels": 2,
178
+ "samplerate": 44100,
179
+ "segment": 44,
180
+ "channels": 48,
181
+ "channels_time": null,
182
+ "growth": 2,
183
+ "nfft": 4096,
184
+ "wiener_iters": 0,
185
+ "end_iters": 0,
186
+ "wiener_residual": false,
187
+ "cac": false,
188
+ "depth": 6,
189
+ "rewrite": true,
190
+ "hybrid": true,
191
+ "hybrid_old": false,
192
+ "multi_freqs": [],
193
+ "multi_freqs_depth": 3,
194
+ "freq_emb": 0.2,
195
+ "emb_scale": 10,
196
+ "emb_smooth": true,
197
+ "kernel_size": 8,
198
+ "stride": 4,
199
+ "time_stride": 2,
200
+ "context": 1,
201
+ "context_enc": 0,
202
+ "norm_starts": 4,
203
+ "norm_groups": 4,
204
+ "dconv_mode": 1,
205
+ "dconv_depth": 2,
206
+ "dconv_comp": 4,
207
+ "dconv_attn": 4,
208
+ "dconv_lstm": 4,
209
+ "dconv_init": 0.001,
210
+ "rescale": 0.1
211
+ }
212
+ },
213
+ {
214
+ "model_class": "HTDemucsMLX",
215
+ "kwargs": {
216
+ "sources": [
217
+ "drums",
218
+ "bass",
219
+ "other",
220
+ "vocals"
221
+ ],
222
+ "audio_channels": 2,
223
+ "samplerate": 44100,
224
+ "segment": 44,
225
+ "channels": 48,
226
+ "channels_time": null,
227
+ "growth": 2,
228
+ "nfft": 4096,
229
+ "wiener_iters": 0,
230
+ "end_iters": 0,
231
+ "wiener_residual": false,
232
+ "cac": true,
233
+ "depth": 6,
234
+ "rewrite": true,
235
+ "hybrid": true,
236
+ "hybrid_old": false,
237
+ "multi_freqs": [],
238
+ "multi_freqs_depth": 3,
239
+ "freq_emb": 0.2,
240
+ "emb_scale": 10,
241
+ "emb_smooth": true,
242
+ "kernel_size": 8,
243
+ "stride": 4,
244
+ "time_stride": 2,
245
+ "context": 1,
246
+ "context_enc": 0,
247
+ "norm_starts": 4,
248
+ "norm_groups": 4,
249
+ "dconv_mode": 1,
250
+ "dconv_depth": 2,
251
+ "dconv_comp": 4,
252
+ "dconv_attn": 4,
253
+ "dconv_lstm": 4,
254
+ "dconv_init": 0.001,
255
+ "rescale": 0.1
256
+ }
257
+ }
258
+ ],
259
+ "dtype": "float16"
260
+ }
mdx_q.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:151f7b3917ae8325d7612690932d4e1be0acd40ee6e51fd42f9219b1c25e8c6c
3
+ size 690908505
mdx_q_config.json ADDED
@@ -0,0 +1,245 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_name": "mdx_q",
3
+ "model_class": "BagOfModelsMLX",
4
+ "sub_model_class": "DemucsMLX",
5
+ "num_models": 4,
6
+ "weights": [
7
+ [
8
+ 1.0,
9
+ 1.0,
10
+ 0.0,
11
+ 0.0
12
+ ],
13
+ [
14
+ 0.0,
15
+ 1.0,
16
+ 0.0,
17
+ 0.0
18
+ ],
19
+ [
20
+ 1.0,
21
+ 0.0,
22
+ 1.0,
23
+ 1.0
24
+ ],
25
+ [
26
+ 1.0,
27
+ 0.0,
28
+ 1.0,
29
+ 1.0
30
+ ]
31
+ ],
32
+ "args": [],
33
+ "kwargs": {
34
+ "sources": [
35
+ "drums",
36
+ "bass",
37
+ "other",
38
+ "vocals"
39
+ ],
40
+ "audio_channels": 2,
41
+ "samplerate": 44100,
42
+ "segment": 44,
43
+ "channels": 64,
44
+ "growth": 2,
45
+ "depth": 6,
46
+ "rewrite": false,
47
+ "lstm_layers": 0,
48
+ "kernel_size": 8,
49
+ "stride": 4,
50
+ "context": 1,
51
+ "gelu": true,
52
+ "glu": true,
53
+ "norm_groups": 4,
54
+ "norm_starts": 4,
55
+ "dconv_depth": 2,
56
+ "dconv_mode": 1,
57
+ "dconv_comp": 4,
58
+ "dconv_attn": 4,
59
+ "dconv_lstm": 4,
60
+ "dconv_init": 0.0001,
61
+ "resample": true,
62
+ "normalize": true,
63
+ "rescale": 0.1,
64
+ "gelu_act": true,
65
+ "glu_act": true
66
+ },
67
+ "mlx_version": "0.30.3",
68
+ "tensor_count": 1298,
69
+ "sub_model_classes": [
70
+ "DemucsMLX",
71
+ "DemucsMLX",
72
+ "HDemucsMLX",
73
+ "HDemucsMLX"
74
+ ],
75
+ "model_configs": [
76
+ {
77
+ "model_class": "DemucsMLX",
78
+ "kwargs": {
79
+ "sources": [
80
+ "drums",
81
+ "bass",
82
+ "other",
83
+ "vocals"
84
+ ],
85
+ "audio_channels": 2,
86
+ "samplerate": 44100,
87
+ "segment": 44,
88
+ "channels": 64,
89
+ "growth": 2,
90
+ "depth": 6,
91
+ "rewrite": false,
92
+ "lstm_layers": 0,
93
+ "kernel_size": 8,
94
+ "stride": 4,
95
+ "context": 1,
96
+ "gelu": true,
97
+ "glu": true,
98
+ "norm_groups": 4,
99
+ "norm_starts": 4,
100
+ "dconv_depth": 2,
101
+ "dconv_mode": 1,
102
+ "dconv_comp": 4,
103
+ "dconv_attn": 4,
104
+ "dconv_lstm": 4,
105
+ "dconv_init": 0.0001,
106
+ "resample": true,
107
+ "normalize": true,
108
+ "rescale": 0.1,
109
+ "gelu_act": true,
110
+ "glu_act": true
111
+ }
112
+ },
113
+ {
114
+ "model_class": "DemucsMLX",
115
+ "kwargs": {
116
+ "sources": [
117
+ "drums",
118
+ "bass",
119
+ "other",
120
+ "vocals"
121
+ ],
122
+ "audio_channels": 2,
123
+ "samplerate": 44100,
124
+ "segment": 44,
125
+ "channels": 64,
126
+ "growth": 2,
127
+ "depth": 6,
128
+ "rewrite": false,
129
+ "lstm_layers": 0,
130
+ "kernel_size": 8,
131
+ "stride": 4,
132
+ "context": 1,
133
+ "gelu": true,
134
+ "glu": true,
135
+ "norm_groups": 4,
136
+ "norm_starts": 4,
137
+ "dconv_depth": 2,
138
+ "dconv_mode": 1,
139
+ "dconv_comp": 4,
140
+ "dconv_attn": 4,
141
+ "dconv_lstm": 4,
142
+ "dconv_init": 0.0001,
143
+ "resample": true,
144
+ "normalize": true,
145
+ "rescale": 0.1,
146
+ "gelu_act": true,
147
+ "glu_act": true
148
+ }
149
+ },
150
+ {
151
+ "model_class": "HDemucsMLX",
152
+ "kwargs": {
153
+ "sources": [
154
+ "drums",
155
+ "bass",
156
+ "other",
157
+ "vocals"
158
+ ],
159
+ "audio_channels": 2,
160
+ "samplerate": 44100,
161
+ "segment": 44,
162
+ "channels": 48,
163
+ "channels_time": null,
164
+ "growth": 2,
165
+ "nfft": 4096,
166
+ "wiener_iters": 0,
167
+ "end_iters": 0,
168
+ "wiener_residual": false,
169
+ "cac": false,
170
+ "depth": 6,
171
+ "rewrite": true,
172
+ "hybrid": true,
173
+ "hybrid_old": true,
174
+ "multi_freqs": [],
175
+ "multi_freqs_depth": 3,
176
+ "freq_emb": 0.2,
177
+ "emb_scale": 10,
178
+ "emb_smooth": true,
179
+ "kernel_size": 8,
180
+ "stride": 4,
181
+ "time_stride": 2,
182
+ "context": 1,
183
+ "context_enc": 0,
184
+ "norm_starts": 999,
185
+ "norm_groups": 4,
186
+ "dconv_mode": 1,
187
+ "dconv_depth": 2,
188
+ "dconv_comp": 4,
189
+ "dconv_attn": 4,
190
+ "dconv_lstm": 4,
191
+ "dconv_init": 0.0001,
192
+ "rescale": 0.1
193
+ }
194
+ },
195
+ {
196
+ "model_class": "HDemucsMLX",
197
+ "kwargs": {
198
+ "sources": [
199
+ "drums",
200
+ "bass",
201
+ "other",
202
+ "vocals"
203
+ ],
204
+ "audio_channels": 2,
205
+ "samplerate": 44100,
206
+ "segment": 44,
207
+ "channels": 48,
208
+ "channels_time": null,
209
+ "growth": 2,
210
+ "nfft": 4096,
211
+ "wiener_iters": 0,
212
+ "end_iters": 0,
213
+ "wiener_residual": false,
214
+ "cac": true,
215
+ "depth": 6,
216
+ "rewrite": true,
217
+ "hybrid": true,
218
+ "hybrid_old": false,
219
+ "multi_freqs": [
220
+ 0.1,
221
+ 0.3
222
+ ],
223
+ "multi_freqs_depth": 2,
224
+ "freq_emb": 0.2,
225
+ "emb_scale": 10,
226
+ "emb_smooth": true,
227
+ "kernel_size": 8,
228
+ "stride": 4,
229
+ "time_stride": 2,
230
+ "context": 1,
231
+ "context_enc": 0,
232
+ "norm_starts": 999,
233
+ "norm_groups": 4,
234
+ "dconv_mode": 1,
235
+ "dconv_depth": 2,
236
+ "dconv_comp": 4,
237
+ "dconv_attn": 4,
238
+ "dconv_lstm": 4,
239
+ "dconv_init": 0.0001,
240
+ "rescale": 0.1
241
+ }
242
+ }
243
+ ],
244
+ "dtype": "float16"
245
+ }