iky1e commited on Mar 16

Commit

908d2d0

verified ·

1 Parent(s): 4da88ac

Add all 8 Demucs models in float16 safetensors format

Browse files

Files changed (18) hide show

README.md +127 -0
export_from_pytorch.py +472 -0
hdemucs_mmi.safetensors +3 -0
hdemucs_mmi_config.json +60 -0
htdemucs.safetensors +3 -0
htdemucs_6s.safetensors +3 -0
htdemucs_6s_config.json +95 -0
htdemucs_config.json +91 -0
htdemucs_ft.safetensors +3 -0
htdemucs_ft_config.json +109 -0
mdx.safetensors +3 -0
mdx_config.json +245 -0
mdx_extra.safetensors +3 -0
mdx_extra_config.json +260 -0
mdx_extra_q.safetensors +3 -0
mdx_extra_q_config.json +260 -0
mdx_q.safetensors +3 -0
mdx_q_config.json +245 -0

README.md ADDED Viewed

	@@ -0,0 +1,127 @@

+---
+license: mit
+library_name: mlx
+tags:
+  - mlx
+  - audio
+  - music-source-separation
+  - source-separation
+  - demucs
+  - htdemucs
+  - hdemucs
+  - apple-silicon
+  - float16
+base_model: adefossez/demucs
+pipeline_tag: audio-to-audio
+---
+> Originally from: [iky1e/demucs-mlx-fp16](https://huggingface.co/iky1e/demucs-mlx-fp16)
+>
+> Float32 variant: [mlx-community/demucs-mlx](https://huggingface.co/mlx-community/demucs-mlx)
+# Demucs — MLX (float16)
+Float16 MLX-compatible weights for all 8 pretrained [Demucs](https://github.com/adefossez/demucs) models, converted to `safetensors` format for inference on Apple Silicon.
+This is the **float16 variant** of [iky1e/demucs-mlx](https://huggingface.co/iky1e/demucs-mlx) — same models, half the file size, identical output quality. Recommended for Apple Silicon where memory is constrained (iOS, smaller Macs).
+Demucs is a music source separation model that splits audio into stems: `drums`, `bass`, `other`, `vocals` (and `guitar`, `piano` for 6-source models).
+## Models
+| Model | What it is | Architecture | Sub-models | Sources | Weights (fp16) | Weights (fp32) |
+|-------|-----------|-------------|-----------|---------|----------------|----------------|
+| `htdemucs` | Default v4 model, best speed/quality balance | HTDemucs (v4) | 1 | 4 | 80 MB | 160 MB |
+| `htdemucs_ft` | Fine-tuned v4, best overall quality | HTDemucs (v4) | 4 (fine-tuned) | 4 | 321 MB | 641 MB |
+| `htdemucs_6s` | 6-source v4 (adds guitar + piano stems) | HTDemucs (v4) | 1 | 6 | 52 MB | 105 MB |
+| `hdemucs_mmi` | v3 hybrid, trained on more data | HDemucs (v3) | 1 | 4 | 160 MB | 319 MB |
+| `mdx` | v3 bag-of-models ensemble | Demucs + HDemucs | 4 (bag) | 4 | 659 MB | 1.3 GB |
+| `mdx_extra` | v3 ensemble trained on extra data | HDemucs | 4 (bag) | 4 | 638 MB | 1.2 GB |
+| `mdx_q` | Quantized v3 ensemble (same quality, smaller) | Demucs + HDemucs | 4 (bag) | 4 | 659 MB | 1.3 GB |
+| `mdx_extra_q` | Quantized v3 extra ensemble | HDemucs | 4 (bag) | 4 | 638 MB | 1.2 GB |
+All models output stereo audio at 44.1 kHz.
+## Float16 vs Float32
+Output quality is **identical** — max sample difference is 3.1e-5 (one int16 LSB), correlation > 0.999999999. MLX on Apple Silicon upcasts float16 weights to float32 for computation, so the math is the same.
+| Metric | float32 ([iky1e/demucs-mlx](https://huggingface.co/iky1e/demucs-mlx)) | float16 (this repo) |
+|--------|---------|---------|
+| htdemucs file size | 160 MB | **80 MB** |
+| htdemucs RSS (peak memory) | 1311 MB | **1210 MB** |
+| htdemucs speed (M1 Pro) | 7.1s | 7.9s |
+| Output quality | reference | identical |
+## Origin
+- Original model/repo: [adefossez/demucs](https://github.com/adefossez/demucs)
+- Float32 weights: [iky1e/demucs-mlx](https://huggingface.co/iky1e/demucs-mlx)
+- License: MIT (same as original Demucs)
+- Conversion path: PyTorch checkpoints → safetensors float32 → float16
+- Swift MLX port: [kylehowells/demucs-mlx-swift](https://github.com/kylehowells/demucs-mlx-swift)
+## Files
+Each model consists of two files at the repo root:
+- `{model_name}.safetensors` — model weights (float16)
+- `{model_name}_config.json` — model class, architecture config, and bag-of-models metadata
+## Usage
+### Swift (demucs-mlx-swift)
+Point the model directory or repo to this float16 variant:
+```bash
+# Use float16 models from local directory
+demucs-mlx-swift -n htdemucs --model-dir /path/to/demucs-mlx-fp16 song.wav
+# Or set the HF repo environment variable
+export DEMUCS_MLX_SWIFT_MODEL_REPO=iky1e/demucs-mlx-fp16
+demucs-mlx-swift -n htdemucs song.wav
+```
+Or use the Swift API directly:
+```swift
+import DemucsMLX
+let separator = try DemucsSeparator(modelName: "htdemucs")
+let result = try separator.separate(fileAt: URL(fileURLWithPath: "song.wav"))
+```
+## Converting from PyTorch
+To reproduce the export directly from PyTorch Demucs checkpoints:
+```bash
+pip install demucs safetensors numpy
+# Export all 8 models as float16 (default)
+python export_from_pytorch.py --out-dir ./output
+# Export as float32
+python export_from_pytorch.py --out-dir ./output --dtype float32
+```
+The conversion script (`export_from_pytorch.py`) is available in the [demucs-mlx-swift](https://github.com/kylehowells/demucs-mlx-swift) repo under `scripts/`.
+## Citation
+```bibtex
+@inproceedings{rouard2022hybrid,
+  title={Hybrid Transformers for Music Source Separation},
+  author={Rouard, Simon and Massa, Francisco and Defossez, Alexandre},
+  booktitle={ICASSP 23},
+  year={2023}
+}
+@inproceedings{defossez2021hybrid,
+  title={Hybrid Spectrogram and Waveform Source Separation},
+  author={Defossez, Alexandre},
+  booktitle={Proceedings of the ISMIR 2021 Workshop on Music Source Separation},
+  year={2021}
+}
+```

export_from_pytorch.py ADDED Viewed

	@@ -0,0 +1,472 @@

+#!/usr/bin/env python3
+"""
+Export Demucs PyTorch models directly to safetensors + JSON config for Swift MLX.
+Converts all 8 pretrained models directly from the original PyTorch demucs package.
+No dependency on demucs-mlx or any other re-implementation.
+Usage:
+    # Export all models
+    python scripts/export_from_pytorch.py --out-dir ~/.cache/demucs-mlx-swift-models
+    # Export specific models
+    python scripts/export_from_pytorch.py --models htdemucs htdemucs_ft --out-dir ./Models
+Requirements:
+    pip install demucs safetensors numpy
+"""
+from __future__ import annotations
+import argparse
+import inspect
+import json
+import re
+import sys
+from fractions import Fraction
+from pathlib import Path
+import numpy as np
+import torch
+ALL_MODELS = [
+    "htdemucs",
+    "htdemucs_ft",
+    "htdemucs_6s",
+    "hdemucs_mmi",
+    "mdx",
+    "mdx_extra",
+    "mdx_q",
+    "mdx_extra_q",
+]
+# Map PyTorch class names to MLX class names used by Swift loader
+CLASS_MAP = {
+    "Demucs": "DemucsMLX",
+    "HDemucs": "HDemucsMLX",
+    "HTDemucs": "HTDemucsMLX",
+}
+# Conv-like layer names that get .conv. wrapper in MLX
+CONV_LAYER_NAMES = {
+    "conv", "conv_tr", "rewrite",
+    "channel_upsampler", "channel_downsampler",
+    "channel_upsampler_t", "channel_downsampler_t",
+}
+# DConv attention sub-module names (LocalState)
+DCONV_ATTN_NAMES = {"content", "key", "query", "proj", "query_decay", "query_freqs"}
+def to_json_serializable(obj):
+    """Convert Python objects to JSON-serializable types."""
+    if isinstance(obj, Fraction):
+        return f"{obj.numerator}/{obj.denominator}"
+    if isinstance(obj, torch.Tensor):
+        return obj.item() if obj.numel() == 1 else obj.tolist()
+    if isinstance(obj, np.ndarray):
+        return obj.tolist()
+    if isinstance(obj, (list, tuple)):
+        return [to_json_serializable(x) for x in obj]
+    if isinstance(obj, dict):
+        return {str(k): to_json_serializable(v) for k, v in obj.items()}
+    return obj
+def transpose_conv_weights(key: str, value: np.ndarray, is_conv_transpose: bool = False) -> np.ndarray:
+    """Transpose PyTorch conv weights to MLX layout.
+    Conv1d:          (out, in, k)    → MLX: (out, k, in)       transpose (0,2,1)
+    Conv2d:          (out, in, h, w) → MLX: (out, h, w, in)    transpose (0,2,3,1)
+    ConvTranspose1d: (in, out, k)    → MLX: (out, k, in)       transpose (1,2,0)
+    ConvTranspose2d: (in, out, h, w) → MLX: (out, h, w, in)    transpose (1,2,3,0)
+    """
+    if not key.endswith(".weight"):
+        return value
+    if len(value.shape) == 3:
+        return np.transpose(value, (1, 2, 0) if is_conv_transpose else (0, 2, 1))
+    if len(value.shape) == 4:
+        return np.transpose(value, (1, 2, 3, 0) if is_conv_transpose else (0, 2, 3, 1))
+    return value
+def remap_key(
+    key: str,
+    value: np.ndarray,
+    model_type: str = "HTDemucs",
+    dconv_conv_slots: set | None = None,
+    seq_conv_slots: set | None = None,
+) -> list[tuple[str, np.ndarray]]:
+    """Remap a PyTorch state dict key to MLX key convention.
+    Returns a list of (key, value) pairs (multiple for attention in_proj splits).
+    Duplicate target keys (e.g. LSTM bias_ih + bias_hh) are merged by the caller.
+    Args:
+        key: PyTorch state dict key
+        value: numpy array (already transposed for conv weights)
+        model_type: PyTorch class name ("Demucs", "HDemucs", "HTDemucs")
+        dconv_conv_slots: set of (block_prefix, slot_str) for DConv slots with 3D weights
+        seq_conv_slots: set of (enc_dec, layer, slot) for Demucs v1/v2 Sequential Conv slots
+    """
+    dconv_conv_slots = dconv_conv_slots or set()
+    seq_conv_slots = seq_conv_slots or set()
+    # =========================================================================
+    # Step 1: Demucs v1/v2 Sequential insertion
+    # encoder.{i}.{j}.rest → encoder.{i}.layers.{j}.rest
+    # decoder.{i}.{j}.rest → decoder.{i}.layers.{j}.rest
+    # =========================================================================
+    if model_type == "Demucs":
+        m = re.match(r"(encoder|decoder)\.(\d+)\.(\d+)(\..*)?$", key)
+        if m:
+            enc_dec, layer, slot, rest = m.groups()
+            rest = rest or ""
+            key = f"{enc_dec}.{layer}.layers.{slot}{rest}"
+    # =========================================================================
+    # Step 1.5: Demucs v1/v2 Sequential Conv/Norm slot wrapping
+    # encoder.{i}.layers.{j}.weight → encoder.{i}.layers.{j}.conv.weight (if Conv slot)
+    # =========================================================================
+    if model_type == "Demucs":
+        m = re.match(r"(encoder|decoder)\.(\d+)\.layers\.(\d+)\.(weight|bias)$", key)
+        if m:
+            enc_dec, layer, slot, param = m.groups()
+            if (enc_dec, layer, slot) in seq_conv_slots:
+                return [(f"{enc_dec}.{layer}.layers.{slot}.conv.{param}", value)]
+            else:
+                return [(f"{enc_dec}.{layer}.layers.{slot}.{param}", value)]
+    # =========================================================================
+    # Step 2: DConv internal slot handling
+    # Matches: *.layers.{block_idx}.{slot_idx}.{rest}
+    # Both HDemucs (.dconv.layers.) and Demucs v1/v2 (.layers.{N}.layers.) end
+    # with this pattern after Step 1.
+    # =========================================================================
+    m = re.match(r"(.+\.layers\.\d+)\.(\d+)\.(.+)$", key)
+    if m:
+        block_prefix = m.group(1)
+        slot = m.group(2)
+        rest = m.group(3)
+        # --- 2a. Simple weight/bias/scale ---
+        if rest in ("weight", "bias", "scale"):
+            if rest == "weight" and len(value.shape) >= 2:
+                # 3D weight = Conv1d → add .conv.
+                return [(f"{block_prefix}.layers.{slot}.conv.{rest}", value)]
+            elif rest == "weight":
+                # 1D weight = GroupNorm → no wrapper
+                return [(f"{block_prefix}.layers.{slot}.{rest}", value)]
+            elif rest == "bias":
+                if (block_prefix, slot) in dconv_conv_slots:
+                    return [(f"{block_prefix}.layers.{slot}.conv.{rest}", value)]
+                else:
+                    return [(f"{block_prefix}.layers.{slot}.{rest}", value)]
+            else:  # scale
+                return [(f"{block_prefix}.layers.{slot}.{rest}", value)]
+        # --- 2b. LSTM weights/biases ---
+        m_lstm = re.match(r"lstm\.(weight|bias)_(ih|hh)_l(\d+)(_reverse)?$", rest)
+        if m_lstm:
+            wb, ih_hh, layer_idx, reverse = m_lstm.groups()
+            direction = "backward_lstms" if reverse else "forward_lstms"
+            if wb == "weight":
+                param = "Wx" if ih_hh == "ih" else "Wh"
+                return [(f"{block_prefix}.layers.{slot}.{direction}.{layer_idx}.{param}", value)]
+            else:  # bias — both bias_ih and bias_hh map to same key; caller merges
+                return [(f"{block_prefix}.layers.{slot}.{direction}.{layer_idx}.bias", value)]
+        # --- 2c. LSTM linear ---
+        m_linear = re.match(r"linear\.(weight|bias)$", rest)
+        if m_linear:
+            param = m_linear.group(1)
+            return [(f"{block_prefix}.layers.{slot}.linear.{param}", value)]
+        # --- 2d. Attention sub-modules (LocalState) ---
+        m_attn = re.match(r"(content|key|query|proj|query_decay|query_freqs)\.(weight|bias)$", rest)
+        if m_attn:
+            attn_name, param = m_attn.groups()
+            # These are all Conv1d modules → add .conv. wrapper
+            return [(f"{block_prefix}.layers.{slot}.{attn_name}.conv.{param}", value)]
+        # --- 2e. Fallback for unknown compound keys ---
+        return [(f"{block_prefix}.layers.{slot}.{rest}", value)]
+    # =========================================================================
+    # Step 3: MultiheadAttention in_proj split (HTDemucs transformer)
+    # =========================================================================
+    m = re.match(r"(.+)\.(self_attn|cross_attn)\.in_proj_(weight|bias)$", key)
+    if m:
+        prefix, attn_type, param = m.group(1), m.group(2), m.group(3)
+        mlx_attn = "attn" if attn_type == "self_attn" else "cross_attn"
+        dim = value.shape[0] // 3
+        q, k_val, v = value[:dim], value[dim : 2 * dim], value[2 * dim :]
+        return [
+            (f"{prefix}.{mlx_attn}.query_proj.{param}", q),
+            (f"{prefix}.{mlx_attn}.key_proj.{param}", k_val),
+            (f"{prefix}.{mlx_attn}.value_proj.{param}", v),
+        ]
+    # self_attn.out_proj → attn.out_proj
+    m = re.match(r"(.+)\.self_attn\.out_proj\.(weight|bias)$", key)
+    if m:
+        prefix, param = m.group(1), m.group(2)
+        return [(f"{prefix}.attn.out_proj.{param}", value)]
+    # =========================================================================
+    # Step 4: norm_out wrapping → norm_out.gn
+    # =========================================================================
+    m = re.match(r"(.+)\.norm_out\.(weight|bias)$", key)
+    if m:
+        prefix, param = m.group(1), m.group(2)
+        return [(f"{prefix}.norm_out.gn.{param}", value)]
+    # =========================================================================
+    # Step 5: Bottleneck LSTM (Demucs v1/v2 and HDemucs)
+    # lstm.lstm.weight_ih_l0 → lstm.forward_lstms.0.Wx
+    # =========================================================================
+    m = re.match(r"(.+)\.lstm\.(weight|bias)_(ih|hh)_l(\d+)(_reverse)?$", key)
+    if m:
+        prefix = m.group(1)
+        wb = m.group(2)
+        ih_hh = m.group(3)
+        layer_idx = m.group(4)
+        reverse = m.group(5)
+        direction = "backward_lstms" if reverse else "forward_lstms"
+        if wb == "weight":
+            param = "Wx" if ih_hh == "ih" else "Wh"
+            return [(f"{prefix}.{direction}.{layer_idx}.{param}", value)]
+        else:  # bias — merge handled by caller
+            return [(f"{prefix}.{direction}.{layer_idx}.bias", value)]
+    # =========================================================================
+    # Step 6: Conv/ConvTranspose/Rewrite named layers → add .conv. wrapper
+    # =========================================================================
+    parts = key.rsplit(".", 1)
+    if len(parts) == 2:
+        path, param = parts
+        path_parts = path.split(".")
+        last_name = path_parts[-1]
+        if last_name in CONV_LAYER_NAMES and param in ("weight", "bias"):
+            return [(f"{path}.conv.{param}", value)]
+    # =========================================================================
+    # Default: no change
+    # =========================================================================
+    return [(key, value)]
+def convert_sub_model(model, prefix: str) -> dict[str, np.ndarray]:
+    """Convert a single sub-model's state dict to MLX-compatible numpy arrays."""
+    cls_name = type(model).__name__
+    # --- Pre-scan: identify ConvTranspose modules by type ---
+    conv_tr_paths = set()
+    for name, module in model.named_modules():
+        if isinstance(module, (torch.nn.ConvTranspose1d, torch.nn.ConvTranspose2d)):
+            conv_tr_paths.add(name)
+    # --- Collect state dict as numpy ---
+    state_items = []
+    for key, tensor in model.state_dict().items():
+        arr = tensor.detach().cpu().float().numpy()
+        state_items.append((key, arr))
+    # --- Pre-scan: identify DConv Conv slots (3D weights) ---
+    # Pattern: *.layers.{block}.{slot}.weight where value is 3D
+    # For Demucs v1/v2, apply Sequential insertion first so lookups match remap_key
+    dconv_conv_slots: set[tuple[str, str]] = set()
+    for key, arr in state_items:
+        scan_key = key
+        if cls_name == "Demucs":
+            m = re.match(r"(encoder|decoder)\.(\d+)\.(\d+)(\..*)?$", scan_key)
+            if m:
+                enc_dec, layer, slot, rest = m.groups()
+                rest = rest or ""
+                scan_key = f"{enc_dec}.{layer}.layers.{slot}{rest}"
+        m = re.match(r"(.+\.layers\.\d+)\.(\d+)\.weight$", scan_key)
+        if m and len(arr.shape) >= 2:
+            dconv_conv_slots.add((m.group(1), m.group(2)))
+    # --- Pre-scan: Demucs v1/v2 Sequential Conv slots ---
+    seq_conv_slots: set[tuple[str, str, str]] = set()
+    if cls_name == "Demucs":
+        for key, arr in state_items:
+            m = re.match(r"(encoder|decoder)\.(\d+)\.(\d+)\.weight$", key)
+            if m and len(arr.shape) >= 2:
+                seq_conv_slots.add((m.group(1), m.group(2), m.group(3)))
+    # --- Convert ---
+    weights: dict[str, np.ndarray] = {}
+    for key, arr in state_items:
+        # Determine if this belongs to a ConvTranspose module
+        is_conv_tr = any(key.startswith(p + ".") for p in conv_tr_paths)
+        # Transpose conv weights
+        arr = transpose_conv_weights(key, arr, is_conv_transpose=is_conv_tr)
+        # Remap key
+        remapped = remap_key(key, arr, cls_name, dconv_conv_slots, seq_conv_slots)
+        for new_key, new_val in remapped:
+            full_key = f"{prefix}{new_key}"
+            if full_key in weights:
+                # LSTM bias merge: bias_ih + bias_hh → bias (additive)
+                weights[full_key] = weights[full_key] + new_val
+            else:
+                weights[full_key] = new_val
+    return weights
+def extract_kwargs(model) -> dict:
+    """Extract constructor kwargs from a model using _init_args_kwargs or inspection."""
+    if hasattr(model, "_init_args_kwargs"):
+        _, kwargs = model._init_args_kwargs
+        return {k: to_json_serializable(v) for k, v in kwargs.items()
+                if isinstance(v, (int, float, str, bool, list, tuple, type(None), Fraction))}
+    # Fallback: inspect __init__ signature and read matching attributes
+    sig = inspect.signature(type(model).__init__)
+    kwargs = {}
+    for name in sig.parameters:
+        if name == "self":
+            continue
+        if hasattr(model, name):
+            val = getattr(model, name)
+            kwargs[name] = to_json_serializable(val)
+    return kwargs
+def export_model(model_name: str, out_dir: Path) -> bool:
+    """Export a single model (or bag) to safetensors + config JSON."""
+    from demucs.pretrained import get_model
+    from demucs.apply import BagOfModels
+    print(f"\n--- Exporting {model_name} ---")
+    try:
+        model = get_model(model_name)
+    except Exception as e:
+        print(f"  Failed to load model: {e}")
+        return False
+    is_bag = isinstance(model, BagOfModels)
+    if is_bag:
+        sub_models = list(model.models)
+        num_models = len(sub_models)
+        bag_weights = model.weights.tolist() if hasattr(model.weights, "tolist") else list(model.weights)
+    else:
+        sub_models = [model]
+        num_models = 1
+        bag_weights = None
+    print(f"  {'Bag of ' + str(num_models) + ' models' if is_bag else 'Single model'}")
+    # Collect all weights and metadata
+    all_weights: dict[str, np.ndarray] = {}
+    model_classes: list[str] = []
+    model_configs: list[dict] = []
+    for i, sub in enumerate(sub_models):
+        cls_name = type(sub).__name__
+        mlx_cls = CLASS_MAP.get(cls_name, cls_name)
+        model_classes.append(mlx_cls)
+        print(f"  Model {i}: {cls_name} → {mlx_cls}")
+        prefix = f"model_{i}." if is_bag else ""
+        sub_weights = convert_sub_model(sub, prefix)
+        all_weights.update(sub_weights)
+        kwargs = extract_kwargs(sub)
+        model_configs.append({
+            "model_class": mlx_cls,
+            "kwargs": kwargs,
+        })
+    # Build config JSON
+    config: dict = {
+        "model_name": model_name,
+        "tensor_count": len(all_weights),
+    }
+    if is_bag:
+        config["model_class"] = "BagOfModelsMLX"
+        config["num_models"] = num_models
+        config["weights"] = bag_weights
+        config["sub_model_classes"] = model_classes
+        # If all sub-models are the same class, set sub_model_class for compat
+        unique = set(model_classes)
+        if len(unique) == 1:
+            config["sub_model_class"] = unique.pop()
+        config["model_configs"] = model_configs
+        # Also put kwargs at top level for single-model bags (common case)
+        if num_models == 1:
+            config["kwargs"] = model_configs[0]["kwargs"]
+    else:
+        config["model_class"] = model_classes[0]
+        config["kwargs"] = model_configs[0]["kwargs"]
+    # Save files
+    model_dir = out_dir / model_name
+    model_dir.mkdir(parents=True, exist_ok=True)
+    safetensors_path = model_dir / f"{model_name}.safetensors"
+    config_path = model_dir / f"{model_name}_config.json"
+    # Save safetensors (prefer safetensors library, fallback to mlx)
+    try:
+        from safetensors.numpy import save_file
+        save_file(all_weights, str(safetensors_path))
+    except ImportError:
+        import mlx.core as mx
+        mlx_weights = {k: mx.array(v) for k, v in all_weights.items()}
+        mx.save_safetensors(str(safetensors_path), mlx_weights)
+    with config_path.open("w") as f:
+        json.dump(config, f, indent=2, default=str)
+    size_mb = safetensors_path.stat().st_size / (1024 * 1024)
+    print(f"  Wrote {safetensors_path} ({len(all_weights)} tensors, {size_mb:.0f} MB)")
+    print(f"  Wrote {config_path}")
+    return True
+def main():
+    ap = argparse.ArgumentParser(
+        description="Export Demucs PyTorch models to safetensors for Swift MLX"
+    )
+    ap.add_argument(
+        "--models",
+        nargs="*",
+        default=None,
+        help=f"Models to export (default: all). Choices: {', '.join(ALL_MODELS)}",
+    )
+    ap.add_argument(
+        "--out-dir",
+        default="./Models",
+        help="Output root directory (files go into <out-dir>/<model_name>/)",
+    )
+    args = ap.parse_args()
+    models = args.models or ALL_MODELS
+    out_dir = Path(args.out_dir).resolve()
+    exported = 0
+    failed = 0
+    for name in models:
+        if export_model(name, out_dir):
+            exported += 1
+        else:
+            failed += 1
+    print(f"\n=== Done: {exported} exported, {failed} failed ===")
+    if failed:
+        sys.exit(1)
+if __name__ == "__main__":
+    main()

hdemucs_mmi.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:296eba7d60dd1f1cd8c623ffb6d5712e3781e6fb0117f77d5966513b913a4568
+size 167283844

hdemucs_mmi_config.json ADDED Viewed

	@@ -0,0 +1,60 @@

+{
+  "model_name": "hdemucs_mmi",
+  "model_class": "BagOfModelsMLX",
+  "sub_model_class": "HDemucsMLX",
+  "num_models": 1,
+  "weights": [
+    [
+      1.0,
+      1.0,
+      1.0,
+      1.0
+    ]
+  ],
+  "args": [],
+  "kwargs": {
+    "sources": [
+      "drums",
+      "bass",
+      "other",
+      "vocals"
+    ],
+    "audio_channels": 2,
+    "samplerate": 44100,
+    "segment": 44,
+    "channels": 48,
+    "channels_time": null,
+    "growth": 2,
+    "nfft": 4096,
+    "wiener_iters": 0,
+    "end_iters": 0,
+    "wiener_residual": false,
+    "cac": true,
+    "depth": 6,
+    "rewrite": true,
+    "hybrid": true,
+    "hybrid_old": false,
+    "multi_freqs": [],
+    "multi_freqs_depth": 3,
+    "freq_emb": 0.2,
+    "emb_scale": 10,
+    "emb_smooth": true,
+    "kernel_size": 8,
+    "stride": 4,
+    "time_stride": 2,
+    "context": 1,
+    "context_enc": 0,
+    "norm_starts": 4,
+    "norm_groups": 4,
+    "dconv_mode": 1,
+    "dconv_depth": 2,
+    "dconv_comp": 4,
+    "dconv_attn": 4,
+    "dconv_lstm": 4,
+    "dconv_init": 0.001,
+    "rescale": 0.1
+  },
+  "mlx_version": "0.30.3",
+  "tensor_count": 379,
+  "dtype": "float16"
+}

htdemucs.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fc770cfcd06cceac138f9586e74cbdc65f26dadd79c0cc6658ff6b1159bf3f92
+size 84036122

htdemucs_6s.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:91bf1175c95f6c173d34abb65c2dc1256a0847def4529c7fd20abdea165a6299
+size 54896338

htdemucs_6s_config.json ADDED Viewed

	@@ -0,0 +1,95 @@

+{
+  "model_name": "htdemucs_6s",
+  "model_class": "BagOfModelsMLX",
+  "sub_model_class": "HTDemucsMLX",
+  "num_models": 1,
+  "weights": [
+    [
+      1.0,
+      1.0,
+      1.0,
+      1.0,
+      1.0,
+      1.0
+    ]
+  ],
+  "args": [],
+  "kwargs": {
+    "sources": [
+      "drums",
+      "bass",
+      "other",
+      "vocals",
+      "guitar",
+      "piano"
+    ],
+    "audio_channels": 2,
+    "samplerate": 44100,
+    "segment": "39/5",
+    "channels": 48,
+    "channels_time": null,
+    "growth": 2,
+    "nfft": 4096,
+    "wiener_iters": 0,
+    "end_iters": 0,
+    "wiener_residual": false,
+    "cac": true,
+    "depth": 4,
+    "rewrite": true,
+    "multi_freqs": [],
+    "multi_freqs_depth": 3,
+    "freq_emb": 0.2,
+    "emb_scale": 10,
+    "emb_smooth": true,
+    "kernel_size": 8,
+    "stride": 4,
+    "time_stride": 2,
+    "context": 1,
+    "context_enc": 0,
+    "norm_starts": 4,
+    "norm_groups": 4,
+    "dconv_mode": 3,
+    "dconv_depth": 2,
+    "dconv_comp": 8,
+    "dconv_init": 0.001,
+    "bottom_channels": 0,
+    "t_layers": 5,
+    "t_hidden_scale": 4.0,
+    "t_heads": 8,
+    "t_dropout": 0.02,
+    "t_layer_scale": true,
+    "t_gelu": true,
+    "t_emb": "sin",
+    "t_max_positions": 10000,
+    "t_max_period": 10000.0,
+    "t_weight_pos_embed": 1.0,
+    "t_cape_mean_normalize": true,
+    "t_cape_augment": true,
+    "t_cape_glob_loc_scale": [
+      5000.0,
+      1.0,
+      1.4
+    ],
+    "t_sin_random_shift": 0,
+    "t_norm_in": true,
+    "t_norm_in_group": false,
+    "t_group_norm": false,
+    "t_norm_first": true,
+    "t_norm_out": true,
+    "t_weight_decay": 0.0,
+    "t_lr": null,
+    "t_sparse_self_attn": false,
+    "t_sparse_cross_attn": false,
+    "t_mask_type": "diag",
+    "t_mask_random_seed": 42,
+    "t_sparse_attn_window": 400,
+    "t_global_window": 100,
+    "t_sparsity": 0.95,
+    "t_auto_sparsity": false,
+    "t_cross_first": false,
+    "rescale": 0.1
+  },
+  "mlx_version": "0.30.3",
+  "tensor_count": 565,
+  "dtype": "float16"
+}

htdemucs_config.json ADDED Viewed

	@@ -0,0 +1,91 @@

+{
+  "model_name": "htdemucs",
+  "model_class": "BagOfModelsMLX",
+  "sub_model_class": "HTDemucsMLX",
+  "num_models": 1,
+  "weights": [
+    [
+      1.0,
+      1.0,
+      1.0,
+      1.0
+    ]
+  ],
+  "args": [],
+  "kwargs": {
+    "sources": [
+      "drums",
+      "bass",
+      "other",
+      "vocals"
+    ],
+    "audio_channels": 2,
+    "samplerate": 44100,
+    "segment": "39/5",
+    "channels": 48,
+    "channels_time": null,
+    "growth": 2,
+    "nfft": 4096,
+    "wiener_iters": 0,
+    "end_iters": 0,
+    "wiener_residual": false,
+    "cac": true,
+    "depth": 4,
+    "rewrite": true,
+    "multi_freqs": [],
+    "multi_freqs_depth": 3,
+    "freq_emb": 0.2,
+    "emb_scale": 10,
+    "emb_smooth": true,
+    "kernel_size": 8,
+    "stride": 4,
+    "time_stride": 2,
+    "context": 1,
+    "context_enc": 0,
+    "norm_starts": 4,
+    "norm_groups": 4,
+    "dconv_mode": 3,
+    "dconv_depth": 2,
+    "dconv_comp": 8,
+    "dconv_init": 0.001,
+    "bottom_channels": 512,
+    "t_layers": 5,
+    "t_hidden_scale": 4.0,
+    "t_heads": 8,
+    "t_dropout": 0.02,
+    "t_layer_scale": true,
+    "t_gelu": true,
+    "t_emb": "sin",
+    "t_max_positions": 10000,
+    "t_max_period": 10000.0,
+    "t_weight_pos_embed": 1.0,
+    "t_cape_mean_normalize": true,
+    "t_cape_augment": true,
+    "t_cape_glob_loc_scale": [
+      5000.0,
+      1.0,
+      1.4
+    ],
+    "t_sin_random_shift": 0,
+    "t_norm_in": true,
+    "t_norm_in_group": false,
+    "t_group_norm": false,
+    "t_norm_first": true,
+    "t_norm_out": true,
+    "t_weight_decay": 0.0,
+    "t_lr": null,
+    "t_sparse_self_attn": false,
+    "t_sparse_cross_attn": false,
+    "t_mask_type": "diag",
+    "t_mask_random_seed": 42,
+    "t_sparse_attn_window": 400,
+    "t_global_window": 100,
+    "t_sparsity": 0.95,
+    "t_auto_sparsity": false,
+    "t_cross_first": false,
+    "rescale": 0.1
+  },
+  "mlx_version": "0.30.3",
+  "tensor_count": 573,
+  "dtype": "float16"
+}

htdemucs_ft.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6eb655cf72902869d3118fb3c4b243d6e6de73225472433af6fe54f3a5575a89
+size 336148303

htdemucs_ft_config.json ADDED Viewed

	@@ -0,0 +1,109 @@

+{
+  "model_name": "htdemucs_ft",
+  "model_class": "BagOfModelsMLX",
+  "sub_model_class": "HTDemucsMLX",
+  "num_models": 4,
+  "weights": [
+    [
+      1.0,
+      0.0,
+      0.0,
+      0.0
+    ],
+    [
+      0.0,
+      1.0,
+      0.0,
+      0.0
+    ],
+    [
+      0.0,
+      0.0,
+      1.0,
+      0.0
+    ],
+    [
+      0.0,
+      0.0,
+      0.0,
+      1.0
+    ]
+  ],
+  "args": [],
+  "kwargs": {
+    "sources": [
+      "drums",
+      "bass",
+      "other",
+      "vocals"
+    ],
+    "audio_channels": 2,
+    "samplerate": 44100,
+    "segment": "39/5",
+    "channels": 48,
+    "channels_time": null,
+    "growth": 2,
+    "nfft": 4096,
+    "wiener_iters": 0,
+    "end_iters": 0,
+    "wiener_residual": false,
+    "cac": true,
+    "depth": 4,
+    "rewrite": true,
+    "multi_freqs": [],
+    "multi_freqs_depth": 3,
+    "freq_emb": 0.2,
+    "emb_scale": 10,
+    "emb_smooth": true,
+    "kernel_size": 8,
+    "stride": 4,
+    "time_stride": 2,
+    "context": 1,
+    "context_enc": 0,
+    "norm_starts": 4,
+    "norm_groups": 4,
+    "dconv_mode": 3,
+    "dconv_depth": 2,
+    "dconv_comp": 8,
+    "dconv_init": 0.001,
+    "bottom_channels": 512,
+    "t_layers": 5,
+    "t_hidden_scale": 4.0,
+    "t_heads": 8,
+    "t_dropout": 0.02,
+    "t_layer_scale": true,
+    "t_gelu": true,
+    "t_emb": "sin",
+    "t_max_positions": 10000,
+    "t_max_period": 10000.0,
+    "t_weight_pos_embed": 1.0,
+    "t_cape_mean_normalize": true,
+    "t_cape_augment": true,
+    "t_cape_glob_loc_scale": [
+      5000.0,
+      1.0,
+      1.4
+    ],
+    "t_sin_random_shift": 0,
+    "t_norm_in": true,
+    "t_norm_in_group": false,
+    "t_group_norm": false,
+    "t_norm_first": true,
+    "t_norm_out": true,
+    "t_weight_decay": 0.05,
+    "t_lr": null,
+    "t_sparse_self_attn": false,
+    "t_sparse_cross_attn": false,
+    "t_mask_type": "diag",
+    "t_mask_random_seed": 42,
+    "t_sparse_attn_window": 400,
+    "t_global_window": 100,
+    "t_sparsity": 0.95,
+    "t_auto_sparsity": false,
+    "t_cross_first": false,
+    "rescale": 0.1
+  },
+  "mlx_version": "0.30.3",
+  "tensor_count": 2292,
+  "dtype": "float16"
+}

mdx.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6fe9bf1d699ff8a90236c7cec124419a01dcaf79f2d6333d6c79bc048c10a33e
+size 690908505

mdx_config.json ADDED Viewed

	@@ -0,0 +1,245 @@

+{
+  "model_name": "mdx",
+  "model_class": "BagOfModelsMLX",
+  "sub_model_class": "DemucsMLX",
+  "num_models": 4,
+  "weights": [
+    [
+      1.0,
+      1.0,
+      0.0,
+      0.0
+    ],
+    [
+      0.0,
+      1.0,
+      0.0,
+      0.0
+    ],
+    [
+      1.0,
+      0.0,
+      1.0,
+      1.0
+    ],
+    [
+      1.0,
+      0.0,
+      1.0,
+      1.0
+    ]
+  ],
+  "args": [],
+  "kwargs": {
+    "sources": [
+      "drums",
+      "bass",
+      "other",
+      "vocals"
+    ],
+    "audio_channels": 2,
+    "samplerate": 44100,
+    "segment": 44,
+    "channels": 64,
+    "growth": 2,
+    "depth": 6,
+    "rewrite": false,
+    "lstm_layers": 0,
+    "kernel_size": 8,
+    "stride": 4,
+    "context": 1,
+    "gelu": true,
+    "glu": true,
+    "norm_groups": 4,
+    "norm_starts": 4,
+    "dconv_depth": 2,
+    "dconv_mode": 1,
+    "dconv_comp": 4,
+    "dconv_attn": 4,
+    "dconv_lstm": 4,
+    "dconv_init": 0.0001,
+    "resample": true,
+    "normalize": true,
+    "rescale": 0.1,
+    "gelu_act": true,
+    "glu_act": true
+  },
+  "mlx_version": "0.30.3",
+  "tensor_count": 1298,
+  "sub_model_classes": [
+    "DemucsMLX",
+    "DemucsMLX",
+    "HDemucsMLX",
+    "HDemucsMLX"
+  ],
+  "model_configs": [
+    {
+      "model_class": "DemucsMLX",
+      "kwargs": {
+        "sources": [
+          "drums",
+          "bass",
+          "other",
+          "vocals"
+        ],
+        "audio_channels": 2,
+        "samplerate": 44100,
+        "segment": 44,
+        "channels": 64,
+        "growth": 2,
+        "depth": 6,
+        "rewrite": false,
+        "lstm_layers": 0,
+        "kernel_size": 8,
+        "stride": 4,
+        "context": 1,
+        "gelu": true,
+        "glu": true,
+        "norm_groups": 4,
+        "norm_starts": 4,
+        "dconv_depth": 2,
+        "dconv_mode": 1,
+        "dconv_comp": 4,
+        "dconv_attn": 4,
+        "dconv_lstm": 4,
+        "dconv_init": 0.0001,
+        "resample": true,
+        "normalize": true,
+        "rescale": 0.1,
+        "gelu_act": true,
+        "glu_act": true
+      }
+    },
+    {
+      "model_class": "DemucsMLX",
+      "kwargs": {
+        "sources": [
+          "drums",
+          "bass",
+          "other",
+          "vocals"
+        ],
+        "audio_channels": 2,
+        "samplerate": 44100,
+        "segment": 44,
+        "channels": 64,
+        "growth": 2,
+        "depth": 6,
+        "rewrite": false,
+        "lstm_layers": 0,
+        "kernel_size": 8,
+        "stride": 4,
+        "context": 1,
+        "gelu": true,
+        "glu": true,
+        "norm_groups": 4,
+        "norm_starts": 4,
+        "dconv_depth": 2,
+        "dconv_mode": 1,
+        "dconv_comp": 4,
+        "dconv_attn": 4,
+        "dconv_lstm": 4,
+        "dconv_init": 0.0001,
+        "resample": true,
+        "normalize": true,
+        "rescale": 0.1,
+        "gelu_act": true,
+        "glu_act": true
+      }
+    },
+    {
+      "model_class": "HDemucsMLX",
+      "kwargs": {
+        "sources": [
+          "drums",
+          "bass",
+          "other",
+          "vocals"
+        ],
+        "audio_channels": 2,
+        "samplerate": 44100,
+        "segment": 44,
+        "channels": 48,
+        "channels_time": null,
+        "growth": 2,
+        "nfft": 4096,
+        "wiener_iters": 0,
+        "end_iters": 0,
+        "wiener_residual": false,
+        "cac": false,
+        "depth": 6,
+        "rewrite": true,
+        "hybrid": true,
+        "hybrid_old": true,
+        "multi_freqs": [],
+        "multi_freqs_depth": 3,
+        "freq_emb": 0.2,
+        "emb_scale": 10,
+        "emb_smooth": true,
+        "kernel_size": 8,
+        "stride": 4,
+        "time_stride": 2,
+        "context": 1,
+        "context_enc": 0,
+        "norm_starts": 999,
+        "norm_groups": 4,
+        "dconv_mode": 1,
+        "dconv_depth": 2,
+        "dconv_comp": 4,
+        "dconv_attn": 4,
+        "dconv_lstm": 4,
+        "dconv_init": 0.0001,
+        "rescale": 0.1
+      }
+    },
+    {
+      "model_class": "HDemucsMLX",
+      "kwargs": {
+        "sources": [
+          "drums",
+          "bass",
+          "other",
+          "vocals"
+        ],
+        "audio_channels": 2,
+        "samplerate": 44100,
+        "segment": 44,
+        "channels": 48,
+        "channels_time": null,
+        "growth": 2,
+        "nfft": 4096,
+        "wiener_iters": 0,
+        "end_iters": 0,
+        "wiener_residual": false,
+        "cac": true,
+        "depth": 6,
+        "rewrite": true,
+        "hybrid": true,
+        "hybrid_old": false,
+        "multi_freqs": [
+          0.1,
+          0.3
+        ],
+        "multi_freqs_depth": 2,
+        "freq_emb": 0.2,
+        "emb_scale": 10,
+        "emb_smooth": true,
+        "kernel_size": 8,
+        "stride": 4,
+        "time_stride": 2,
+        "context": 1,
+        "context_enc": 0,
+        "norm_starts": 999,
+        "norm_groups": 4,
+        "dconv_mode": 1,
+        "dconv_depth": 2,
+        "dconv_comp": 4,
+        "dconv_attn": 4,
+        "dconv_lstm": 4,
+        "dconv_init": 0.0001,
+        "rescale": 0.1
+      }
+    }
+  ],
+  "dtype": "float16"
+}

mdx_extra.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:40c4a84c16b27ec0d9fdbd89d12dbac147fd5e3705abd0d4fcbb0f95d2a8463e
+size 669121267

mdx_extra_config.json ADDED Viewed

	@@ -0,0 +1,260 @@

+{
+  "model_name": "mdx_extra",
+  "model_class": "BagOfModelsMLX",
+  "sub_model_class": "HDemucsMLX",
+  "num_models": 4,
+  "weights": [
+    [
+      1.0,
+      1.0,
+      1.0,
+      1.0
+    ],
+    [
+      1.0,
+      1.0,
+      1.0,
+      1.0
+    ],
+    [
+      1.0,
+      1.0,
+      1.0,
+      1.0
+    ],
+    [
+      1.0,
+      1.0,
+      1.0,
+      1.0
+    ]
+  ],
+  "args": [],
+  "kwargs": {
+    "sources": [
+      "drums",
+      "bass",
+      "other",
+      "vocals"
+    ],
+    "audio_channels": 2,
+    "samplerate": 44100,
+    "segment": 44,
+    "channels": 48,
+    "channels_time": null,
+    "growth": 2,
+    "nfft": 4096,
+    "wiener_iters": 0,
+    "end_iters": 0,
+    "wiener_residual": false,
+    "cac": true,
+    "depth": 6,
+    "rewrite": true,
+    "hybrid": true,
+    "hybrid_old": false,
+    "multi_freqs": [],
+    "multi_freqs_depth": 3,
+    "freq_emb": 0.2,
+    "emb_scale": 10,
+    "emb_smooth": true,
+    "kernel_size": 8,
+    "stride": 4,
+    "time_stride": 2,
+    "context": 1,
+    "context_enc": 0,
+    "norm_starts": 4,
+    "norm_groups": 4,
+    "dconv_mode": 1,
+    "dconv_depth": 2,
+    "dconv_comp": 4,
+    "dconv_attn": 4,
+    "dconv_lstm": 4,
+    "dconv_init": 0.0001,
+    "rescale": 0.1
+  },
+  "mlx_version": "0.30.3",
+  "tensor_count": 1516,
+  "model_configs": [
+    {
+      "model_class": "HTDemucsMLX",
+      "kwargs": {
+        "sources": [
+          "drums",
+          "bass",
+          "other",
+          "vocals"
+        ],
+        "audio_channels": 2,
+        "samplerate": 44100,
+        "segment": 44,
+        "channels": 48,
+        "channels_time": null,
+        "growth": 2,
+        "nfft": 4096,
+        "wiener_iters": 0,
+        "end_iters": 0,
+        "wiener_residual": false,
+        "cac": true,
+        "depth": 6,
+        "rewrite": true,
+        "hybrid": true,
+        "hybrid_old": false,
+        "multi_freqs": [],
+        "multi_freqs_depth": 3,
+        "freq_emb": 0.2,
+        "emb_scale": 10,
+        "emb_smooth": true,
+        "kernel_size": 8,
+        "stride": 4,
+        "time_stride": 2,
+        "context": 1,
+        "context_enc": 0,
+        "norm_starts": 4,
+        "norm_groups": 4,
+        "dconv_mode": 1,
+        "dconv_depth": 2,
+        "dconv_comp": 4,
+        "dconv_attn": 4,
+        "dconv_lstm": 4,
+        "dconv_init": 0.0001,
+        "rescale": 0.1
+      }
+    },
+    {
+      "model_class": "HTDemucsMLX",
+      "kwargs": {
+        "sources": [
+          "drums",
+          "bass",
+          "other",
+          "vocals"
+        ],
+        "audio_channels": 2,
+        "samplerate": 44100,
+        "segment": 44,
+        "channels": 48,
+        "channels_time": null,
+        "growth": 2,
+        "nfft": 4096,
+        "wiener_iters": 0,
+        "end_iters": 0,
+        "wiener_residual": false,
+        "cac": false,
+        "depth": 6,
+        "rewrite": true,
+        "hybrid": true,
+        "hybrid_old": true,
+        "multi_freqs": [],
+        "multi_freqs_depth": 3,
+        "freq_emb": 0.2,
+        "emb_scale": 10,
+        "emb_smooth": true,
+        "kernel_size": 8,
+        "stride": 4,
+        "time_stride": 2,
+        "context": 1,
+        "context_enc": 0,
+        "norm_starts": 4,
+        "norm_groups": 4,
+        "dconv_mode": 1,
+        "dconv_depth": 2,
+        "dconv_comp": 4,
+        "dconv_attn": 4,
+        "dconv_lstm": 4,
+        "dconv_init": 0.0001,
+        "rescale": 0.1
+      }
+    },
+    {
+      "model_class": "HTDemucsMLX",
+      "kwargs": {
+        "sources": [
+          "drums",
+          "bass",
+          "other",
+          "vocals"
+        ],
+        "audio_channels": 2,
+        "samplerate": 44100,
+        "segment": 44,
+        "channels": 48,
+        "channels_time": null,
+        "growth": 2,
+        "nfft": 4096,
+        "wiener_iters": 0,
+        "end_iters": 0,
+        "wiener_residual": false,
+        "cac": false,
+        "depth": 6,
+        "rewrite": true,
+        "hybrid": true,
+        "hybrid_old": false,
+        "multi_freqs": [],
+        "multi_freqs_depth": 3,
+        "freq_emb": 0.2,
+        "emb_scale": 10,
+        "emb_smooth": true,
+        "kernel_size": 8,
+        "stride": 4,
+        "time_stride": 2,
+        "context": 1,
+        "context_enc": 0,
+        "norm_starts": 4,
+        "norm_groups": 4,
+        "dconv_mode": 1,
+        "dconv_depth": 2,
+        "dconv_comp": 4,
+        "dconv_attn": 4,
+        "dconv_lstm": 4,
+        "dconv_init": 0.0001,
+        "rescale": 0.1
+      }
+    },
+    {
+      "model_class": "HTDemucsMLX",
+      "kwargs": {
+        "sources": [
+          "drums",
+          "bass",
+          "other",
+          "vocals"
+        ],
+        "audio_channels": 2,
+        "samplerate": 44100,
+        "segment": 44,
+        "channels": 48,
+        "channels_time": null,
+        "growth": 2,
+        "nfft": 4096,
+        "wiener_iters": 0,
+        "end_iters": 0,
+        "wiener_residual": false,
+        "cac": true,
+        "depth": 6,
+        "rewrite": true,
+        "hybrid": true,
+        "hybrid_old": false,
+        "multi_freqs": [],
+        "multi_freqs_depth": 3,
+        "freq_emb": 0.2,
+        "emb_scale": 10,
+        "emb_smooth": true,
+        "kernel_size": 8,
+        "stride": 4,
+        "time_stride": 2,
+        "context": 1,
+        "context_enc": 0,
+        "norm_starts": 4,
+        "norm_groups": 4,
+        "dconv_mode": 1,
+        "dconv_depth": 2,
+        "dconv_comp": 4,
+        "dconv_attn": 4,
+        "dconv_lstm": 4,
+        "dconv_init": 0.0001,
+        "rescale": 0.1
+      }
+    }
+  ],
+  "dtype": "float16"
+}

mdx_extra_q.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a9c0ef032296244e18a478812663bcd5d6a513031656623c0b8ddc19ef4ad827
+size 669121267

mdx_extra_q_config.json ADDED Viewed

	@@ -0,0 +1,260 @@

+{
+  "model_name": "mdx_extra_q",
+  "model_class": "BagOfModelsMLX",
+  "sub_model_class": "HDemucsMLX",
+  "num_models": 4,
+  "weights": [
+    [
+      1.0,
+      1.0,
+      1.0,
+      1.0
+    ],
+    [
+      1.0,
+      1.0,
+      1.0,
+      1.0
+    ],
+    [
+      1.0,
+      1.0,
+      1.0,
+      1.0
+    ],
+    [
+      1.0,
+      1.0,
+      1.0,
+      1.0
+    ]
+  ],
+  "args": [],
+  "kwargs": {
+    "sources": [
+      "drums",
+      "bass",
+      "other",
+      "vocals"
+    ],
+    "audio_channels": 2,
+    "samplerate": 44100,
+    "segment": 44,
+    "channels": 48,
+    "channels_time": null,
+    "growth": 2,
+    "nfft": 4096,
+    "wiener_iters": 0,
+    "end_iters": 0,
+    "wiener_residual": false,
+    "cac": true,
+    "depth": 6,
+    "rewrite": true,
+    "hybrid": true,
+    "hybrid_old": false,
+    "multi_freqs": [],
+    "multi_freqs_depth": 3,
+    "freq_emb": 0.2,
+    "emb_scale": 10,
+    "emb_smooth": true,
+    "kernel_size": 8,
+    "stride": 4,
+    "time_stride": 2,
+    "context": 1,
+    "context_enc": 0,
+    "norm_starts": 4,
+    "norm_groups": 4,
+    "dconv_mode": 1,
+    "dconv_depth": 2,
+    "dconv_comp": 4,
+    "dconv_attn": 4,
+    "dconv_lstm": 4,
+    "dconv_init": 0.001,
+    "rescale": 0.1
+  },
+  "mlx_version": "0.30.3",
+  "tensor_count": 1516,
+  "model_configs": [
+    {
+      "model_class": "HTDemucsMLX",
+      "kwargs": {
+        "sources": [
+          "drums",
+          "bass",
+          "other",
+          "vocals"
+        ],
+        "audio_channels": 2,
+        "samplerate": 44100,
+        "segment": 44,
+        "channels": 48,
+        "channels_time": null,
+        "growth": 2,
+        "nfft": 4096,
+        "wiener_iters": 0,
+        "end_iters": 0,
+        "wiener_residual": false,
+        "cac": true,
+        "depth": 6,
+        "rewrite": true,
+        "hybrid": true,
+        "hybrid_old": false,
+        "multi_freqs": [],
+        "multi_freqs_depth": 3,
+        "freq_emb": 0.2,
+        "emb_scale": 10,
+        "emb_smooth": true,
+        "kernel_size": 8,
+        "stride": 4,
+        "time_stride": 2,
+        "context": 1,
+        "context_enc": 0,
+        "norm_starts": 4,
+        "norm_groups": 4,
+        "dconv_mode": 1,
+        "dconv_depth": 2,
+        "dconv_comp": 4,
+        "dconv_attn": 4,
+        "dconv_lstm": 4,
+        "dconv_init": 0.001,
+        "rescale": 0.1
+      }
+    },
+    {
+      "model_class": "HTDemucsMLX",
+      "kwargs": {
+        "sources": [
+          "drums",
+          "bass",
+          "other",
+          "vocals"
+        ],
+        "audio_channels": 2,
+        "samplerate": 44100,
+        "segment": 44,
+        "channels": 48,
+        "channels_time": null,
+        "growth": 2,
+        "nfft": 4096,
+        "wiener_iters": 0,
+        "end_iters": 0,
+        "wiener_residual": false,
+        "cac": false,
+        "depth": 6,
+        "rewrite": true,
+        "hybrid": true,
+        "hybrid_old": true,
+        "multi_freqs": [],
+        "multi_freqs_depth": 3,
+        "freq_emb": 0.2,
+        "emb_scale": 10,
+        "emb_smooth": true,
+        "kernel_size": 8,
+        "stride": 4,
+        "time_stride": 2,
+        "context": 1,
+        "context_enc": 0,
+        "norm_starts": 4,
+        "norm_groups": 4,
+        "dconv_mode": 1,
+        "dconv_depth": 2,
+        "dconv_comp": 4,
+        "dconv_attn": 4,
+        "dconv_lstm": 4,
+        "dconv_init": 0.001,
+        "rescale": 0.1
+      }
+    },
+    {
+      "model_class": "HTDemucsMLX",
+      "kwargs": {
+        "sources": [
+          "drums",
+          "bass",
+          "other",
+          "vocals"
+        ],
+        "audio_channels": 2,
+        "samplerate": 44100,
+        "segment": 44,
+        "channels": 48,
+        "channels_time": null,
+        "growth": 2,
+        "nfft": 4096,
+        "wiener_iters": 0,
+        "end_iters": 0,
+        "wiener_residual": false,
+        "cac": false,
+        "depth": 6,
+        "rewrite": true,
+        "hybrid": true,
+        "hybrid_old": false,
+        "multi_freqs": [],
+        "multi_freqs_depth": 3,
+        "freq_emb": 0.2,
+        "emb_scale": 10,
+        "emb_smooth": true,
+        "kernel_size": 8,
+        "stride": 4,
+        "time_stride": 2,
+        "context": 1,
+        "context_enc": 0,
+        "norm_starts": 4,
+        "norm_groups": 4,
+        "dconv_mode": 1,
+        "dconv_depth": 2,
+        "dconv_comp": 4,
+        "dconv_attn": 4,
+        "dconv_lstm": 4,
+        "dconv_init": 0.001,
+        "rescale": 0.1
+      }
+    },
+    {
+      "model_class": "HTDemucsMLX",
+      "kwargs": {
+        "sources": [
+          "drums",
+          "bass",
+          "other",
+          "vocals"
+        ],
+        "audio_channels": 2,
+        "samplerate": 44100,
+        "segment": 44,
+        "channels": 48,
+        "channels_time": null,
+        "growth": 2,
+        "nfft": 4096,
+        "wiener_iters": 0,
+        "end_iters": 0,
+        "wiener_residual": false,
+        "cac": true,
+        "depth": 6,
+        "rewrite": true,
+        "hybrid": true,
+        "hybrid_old": false,
+        "multi_freqs": [],
+        "multi_freqs_depth": 3,
+        "freq_emb": 0.2,
+        "emb_scale": 10,
+        "emb_smooth": true,
+        "kernel_size": 8,
+        "stride": 4,
+        "time_stride": 2,
+        "context": 1,
+        "context_enc": 0,
+        "norm_starts": 4,
+        "norm_groups": 4,
+        "dconv_mode": 1,
+        "dconv_depth": 2,
+        "dconv_comp": 4,
+        "dconv_attn": 4,
+        "dconv_lstm": 4,
+        "dconv_init": 0.001,
+        "rescale": 0.1
+      }
+    }
+  ],
+  "dtype": "float16"
+}

mdx_q.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:151f7b3917ae8325d7612690932d4e1be0acd40ee6e51fd42f9219b1c25e8c6c
+size 690908505

mdx_q_config.json ADDED Viewed

	@@ -0,0 +1,245 @@

+{
+  "model_name": "mdx_q",
+  "model_class": "BagOfModelsMLX",
+  "sub_model_class": "DemucsMLX",
+  "num_models": 4,
+  "weights": [
+    [
+      1.0,
+      1.0,
+      0.0,
+      0.0
+    ],
+    [
+      0.0,
+      1.0,
+      0.0,
+      0.0
+    ],
+    [
+      1.0,
+      0.0,
+      1.0,
+      1.0
+    ],
+    [
+      1.0,
+      0.0,
+      1.0,
+      1.0
+    ]
+  ],
+  "args": [],
+  "kwargs": {
+    "sources": [
+      "drums",
+      "bass",
+      "other",
+      "vocals"
+    ],
+    "audio_channels": 2,
+    "samplerate": 44100,
+    "segment": 44,
+    "channels": 64,
+    "growth": 2,
+    "depth": 6,
+    "rewrite": false,
+    "lstm_layers": 0,
+    "kernel_size": 8,
+    "stride": 4,
+    "context": 1,
+    "gelu": true,
+    "glu": true,
+    "norm_groups": 4,
+    "norm_starts": 4,
+    "dconv_depth": 2,
+    "dconv_mode": 1,
+    "dconv_comp": 4,
+    "dconv_attn": 4,
+    "dconv_lstm": 4,
+    "dconv_init": 0.0001,
+    "resample": true,
+    "normalize": true,
+    "rescale": 0.1,
+    "gelu_act": true,
+    "glu_act": true
+  },
+  "mlx_version": "0.30.3",
+  "tensor_count": 1298,
+  "sub_model_classes": [
+    "DemucsMLX",
+    "DemucsMLX",
+    "HDemucsMLX",
+    "HDemucsMLX"
+  ],
+  "model_configs": [
+    {
+      "model_class": "DemucsMLX",
+      "kwargs": {
+        "sources": [
+          "drums",
+          "bass",
+          "other",
+          "vocals"
+        ],
+        "audio_channels": 2,
+        "samplerate": 44100,
+        "segment": 44,
+        "channels": 64,
+        "growth": 2,
+        "depth": 6,
+        "rewrite": false,
+        "lstm_layers": 0,
+        "kernel_size": 8,
+        "stride": 4,
+        "context": 1,
+        "gelu": true,
+        "glu": true,
+        "norm_groups": 4,
+        "norm_starts": 4,
+        "dconv_depth": 2,
+        "dconv_mode": 1,
+        "dconv_comp": 4,
+        "dconv_attn": 4,
+        "dconv_lstm": 4,
+        "dconv_init": 0.0001,
+        "resample": true,
+        "normalize": true,
+        "rescale": 0.1,
+        "gelu_act": true,
+        "glu_act": true
+      }
+    },
+    {
+      "model_class": "DemucsMLX",
+      "kwargs": {
+        "sources": [
+          "drums",
+          "bass",
+          "other",
+          "vocals"
+        ],
+        "audio_channels": 2,
+        "samplerate": 44100,
+        "segment": 44,
+        "channels": 64,
+        "growth": 2,
+        "depth": 6,
+        "rewrite": false,
+        "lstm_layers": 0,
+        "kernel_size": 8,
+        "stride": 4,
+        "context": 1,
+        "gelu": true,
+        "glu": true,
+        "norm_groups": 4,
+        "norm_starts": 4,
+        "dconv_depth": 2,
+        "dconv_mode": 1,
+        "dconv_comp": 4,
+        "dconv_attn": 4,
+        "dconv_lstm": 4,
+        "dconv_init": 0.0001,
+        "resample": true,
+        "normalize": true,
+        "rescale": 0.1,
+        "gelu_act": true,
+        "glu_act": true
+      }
+    },
+    {
+      "model_class": "HDemucsMLX",
+      "kwargs": {
+        "sources": [
+          "drums",
+          "bass",
+          "other",
+          "vocals"
+        ],
+        "audio_channels": 2,
+        "samplerate": 44100,
+        "segment": 44,
+        "channels": 48,
+        "channels_time": null,
+        "growth": 2,
+        "nfft": 4096,
+        "wiener_iters": 0,
+        "end_iters": 0,
+        "wiener_residual": false,
+        "cac": false,
+        "depth": 6,
+        "rewrite": true,
+        "hybrid": true,
+        "hybrid_old": true,
+        "multi_freqs": [],
+        "multi_freqs_depth": 3,
+        "freq_emb": 0.2,
+        "emb_scale": 10,
+        "emb_smooth": true,
+        "kernel_size": 8,
+        "stride": 4,
+        "time_stride": 2,
+        "context": 1,
+        "context_enc": 0,
+        "norm_starts": 999,
+        "norm_groups": 4,
+        "dconv_mode": 1,
+        "dconv_depth": 2,
+        "dconv_comp": 4,
+        "dconv_attn": 4,
+        "dconv_lstm": 4,
+        "dconv_init": 0.0001,
+        "rescale": 0.1
+      }
+    },
+    {
+      "model_class": "HDemucsMLX",
+      "kwargs": {
+        "sources": [
+          "drums",
+          "bass",
+          "other",
+          "vocals"
+        ],
+        "audio_channels": 2,
+        "samplerate": 44100,
+        "segment": 44,
+        "channels": 48,
+        "channels_time": null,
+        "growth": 2,
+        "nfft": 4096,
+        "wiener_iters": 0,
+        "end_iters": 0,
+        "wiener_residual": false,
+        "cac": true,
+        "depth": 6,
+        "rewrite": true,
+        "hybrid": true,
+        "hybrid_old": false,
+        "multi_freqs": [
+          0.1,
+          0.3
+        ],
+        "multi_freqs_depth": 2,
+        "freq_emb": 0.2,
+        "emb_scale": 10,
+        "emb_smooth": true,
+        "kernel_size": 8,
+        "stride": 4,
+        "time_stride": 2,
+        "context": 1,
+        "context_enc": 0,
+        "norm_starts": 999,
+        "norm_groups": 4,
+        "dconv_mode": 1,
+        "dconv_depth": 2,
+        "dconv_comp": 4,
+        "dconv_attn": 4,
+        "dconv_lstm": 4,
+        "dconv_init": 0.0001,
+        "rescale": 0.1
+      }
+    }
+  ],
+  "dtype": "float16"
+}