MLX Speech Models
Collection
Speech AI models for Apple Silicon via MLX. ASR, TTS, VAD, diarization, speaker embedding. • 21 items • Updated
MLX-converted Open-Unmix L (large variant) for music source separation on Apple Silicon.
Separates stereo music into 4 stems: vocals, drums, bass, other. Higher quality than UMX-HQ with 3x more parameters. 4.8x real-time on M2 Max.
| Target | UMX-HQ (8.9M) | UMX-L (28.3M) |
|---|---|---|
| Vocals | 6.23 dB | ~8.5 dB |
| Drums | 6.44 dB | ~7.0 dB |
| Bass | 4.56 dB | ~5.5 dB |
| Other | 3.41 dB | ~4.5 dB |
Used by speech-swift:
audio separate song.wav --model l
let separator = try await SourceSeparator.fromPretrained(
modelId: SourceSeparator.largeModelId)
let stems = separator.separate(audio: stereoAudio, sampleRate: 44100)
vocals.safetensors — Vocals model (108 MB)drums.safetensors — Drums model (108 MB)bass.safetensors — Bass model (108 MB)other.safetensors — Other/accompaniment model (108 MB)config.json — Model configurationMIT (same as original Open-Unmix)
Quantized