vibevoice-7b-mlx / README.md
gafiatulin's picture
Upload README.md with huggingface_hub
1f3d1dc verified
metadata
license: mit
base_model: vibevoice/VibeVoice-7B
tags:
  - mlx
  - tts
  - vibevoice
  - apple-silicon
  - voice-cloning

VibeVoice 7B — MLX

MLX-converted fp16 weights for vibevoice/VibeVoice-7B.

For inference code, benchmarks, and documentation see vibevoice-mlx.

Quick start

git clone https://github.com/gafiatulin/vibevoice-mlx && cd vibevoice-mlx
uv sync

# Basic synthesis (weights download automatically)
uv run vibevoice-mlx --model gafiatulin/vibevoice-7b-mlx --text "Hello, world!" --output hello.wav

# Voice cloning with INT8 quantization
uv run vibevoice-mlx --model gafiatulin/vibevoice-7b-mlx --quantize 8 \
  --ref-audio speaker.wav --text "Clone this voice" --output cloned.wav

Performance

Benchmarked on M4 Max 64GB with voice cloning (~30s audio):

Config RTF Gen Peak Mem
fp16 0.53x 53.0s 21.7 GB
int8 1.06x 29.6s 14.9 GB
int4 1.16x 25.8s 11.2 GB
int8, no-semantic 1.24x 23.3s 13.6 GB
int4, no-semantic 1.37x 19.5s 9.8 GB