Cocktail-Fork MRX (MLX)
Collection
MERL MRX ported to Apple MLX โ 3-stem music/speech/sfx soundtrack separation. Numerically exact vs PyTorch. 4 variants. โข 4 items โข Updated
How to use mlx-community/Cocktail-Fork-MRX with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir Cocktail-Fork-MRX mlx-community/Cocktail-Fork-MRX
Apple MLX port of MERL's MRX (Multi-Resolution CrossNet) โ separates a soundtrack mixture into three stems: music, speech, and sound effects (sfx). Runs natively on Apple Silicon, no PyTorch at inference.
default_ (SNR-loss trained โ the upstream default inference weights).-paper (SI-SNR, ICASSP reproduction) ยท -adapted-loudness ยท -adapted-eq (cinematic-tuned for real movie stems).9e-8; per-stem SI-SDR 107โ139 dB vs torch).pip install cocktail-fork-mlx # or: pip install git+https://github.com/xocialize/cocktail-fork-mlx
cocktail-fork-mlx --audio-path soundtrack.wav --out-dir ./out
# -> out/music.wav out/speech.wav out/sfx.wav
import mlx.core as mx, soundfile as sf, numpy as np
from cocktail_fork_mlx.separate import separate_soundtrack
from cocktail_fork_mlx.weights import from_pretrained
audio, fs = sf.read("soundtrack.wav", always_2d=True) # 44.1 kHz
model = from_pretrained("mlx-community/Cocktail-Fork-MRX")
stems = separate_soundtrack(mx.array(audio.T.astype("float32")), model)
for name, x in stems.items():
sf.write(f"{name}.wav", np.array(x).T, 44100)
Ported by MVS Collective (xocialize). MIT, ยฉ MERL for the original model/weights.
Quantized