MLX Speech Models
Collection
Speech AI models for Apple Silicon via MLX. ASR, TTS, VAD, diarization, speaker embedding. • 21 items • Updated
MLX-converted Open-Unmix HQ for music source separation on Apple Silicon.
Separates stereo music into 4 stems: vocals, drums, bass, other. 4.3x real-time on M2 Max.
| Target | SDR (dB) |
|---|---|
| Vocals | 6.23 |
| Drums | 6.44 |
| Bass | 4.56 |
| Other | 3.41 |
RTF 0.23 (4.3x real-time).
Used by speech-swift:
audio separate song.wav
let separator = try await SourceSeparator.fromPretrained()
let stems = separator.separate(audio: stereoAudio, sampleRate: 44100)
// stems[.vocals], stems[.drums], stems[.bass], stems[.other]
vocals.safetensors — Vocals model (34 MB)drums.safetensors — Drums model (34 MB)bass.safetensors — Bass model (34 MB)other.safetensors — Other/accompaniment model (34 MB)config.json — Model configurationMIT (same as original Open-Unmix)
Quantized