VibeVoice 7B โ€” MLX

MLX-converted fp16 weights for vibevoice/VibeVoice-7B.

For inference code, benchmarks, and documentation see vibevoice-mlx.

Quick start

git clone https://github.com/gafiatulin/vibevoice-mlx && cd vibevoice-mlx
uv sync

# Basic synthesis (weights download automatically)
uv run vibevoice-mlx --model gafiatulin/vibevoice-7b-mlx --text "Hello, world!" --output hello.wav

# Voice cloning with INT8 quantization
uv run vibevoice-mlx --model gafiatulin/vibevoice-7b-mlx --quantize 8 \
  --ref-audio speaker.wav --text "Clone this voice" --output cloned.wav

Performance

Benchmarked on M4 Max 64GB with voice cloning (~30s audio):

Config RTF Gen Peak Mem
fp16 0.53x 53.0s 21.7 GB
int8 1.06x 29.6s 14.9 GB
int4 1.16x 25.8s 11.2 GB
int8, no-semantic 1.24x 23.3s 13.6 GB
int4, no-semantic 1.37x 19.5s 9.8 GB
Downloads last month
255
Safetensors
Model size
9B params
Tensor type
BF16
ยท
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for gafiatulin/vibevoice-7b-mlx

Finetuned
(5)
this model

Collection including gafiatulin/vibevoice-7b-mlx