VibeVoice 1.5B โ€” MLX

MLX-converted fp16 weights for microsoft/VibeVoice-1.5B.

For inference code, benchmarks, and documentation see vibevoice-mlx.

Quick start

git clone https://github.com/gafiatulin/vibevoice-mlx && cd vibevoice-mlx
uv sync

# Basic synthesis (weights download automatically)
uv run vibevoice-mlx --text "Hello, world!" --output hello.wav

# Voice cloning
uv run vibevoice-mlx \
  --ref-audio speaker.wav --text "Clone this voice" --output cloned.wav

Performance

Benchmarked on M4 Max 64GB with voice cloning (~30s audio):

Config RTF Gen Peak Mem
fp16 1.85x 15.5s 8.5 GB
int8 2.65x 10.3s 5.7 GB
int4 2.72x 9.7s 4.6 GB
int8, no-semantic 3.42x 7.8s 4.4 GB
int4, no-semantic 3.92x 6.4s 3.3 GB
Downloads last month
190
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for gafiatulin/vibevoice-1.5b-mlx

Finetuned
(12)
this model

Collection including gafiatulin/vibevoice-1.5b-mlx