VibeVoice MLX
Collection
VibeVoice-{1.5b, 7b} converted to MLX โข 2 items โข Updated
MLX-converted fp16 weights for microsoft/VibeVoice-1.5B.
For inference code, benchmarks, and documentation see vibevoice-mlx.
git clone https://github.com/gafiatulin/vibevoice-mlx && cd vibevoice-mlx
uv sync
# Basic synthesis (weights download automatically)
uv run vibevoice-mlx --text "Hello, world!" --output hello.wav
# Voice cloning
uv run vibevoice-mlx \
--ref-audio speaker.wav --text "Clone this voice" --output cloned.wav
Benchmarked on M4 Max 64GB with voice cloning (~30s audio):
| Config | RTF | Gen | Peak Mem |
|---|---|---|---|
| fp16 | 1.85x | 15.5s | 8.5 GB |
| int8 | 2.65x | 10.3s | 5.7 GB |
| int4 | 2.72x | 9.7s | 4.6 GB |
| int8, no-semantic | 3.42x | 7.8s | 4.4 GB |
| int4, no-semantic | 3.92x | 6.4s | 3.3 GB |
Quantized
Base model
microsoft/VibeVoice-1.5B