vibevoice-mlx / README.md
tamarher's picture
Upload README.md with huggingface_hub
95a503f verified
metadata
language:
  - zh
  - en
license: apache-2.0
library_name: mlx
pipeline_tag: text-to-speech
tags:
  - mlx
  - tts
  - speech
  - voice-conditioned
  - long-form
  - diffusion
  - apple-silicon
  - quantized
  - 8bit

VibeVoice — MLX

VibeVoice converted and quantized for native MLX inference on Apple Silicon.

A hybrid LLM + diffusion architecture built for long-form speech and voice-conditioned generation. Works in greedy or sampled mode, and produces natural-sounding output at scale.

Variants

Path Precision
mlx-int8/ int8 quantized weights

How to Get Started

Via mlx-speech:

python scripts/generate_vibevoice.py \
  --text "Hello from VibeVoice." \
  --output outputs/vibevoice.wav
from mlx_speech.generation import VibeVoiceModel

model = VibeVoiceModel.from_path("mlx-int8")

Model Details

VibeVoice uses a 9B-parameter hybrid architecture combining a Qwen2 language model backbone with a continuous diffusion acoustic decoder. Converted to MLX with explicit weight remapping — no PyTorch at inference time.

See mlx-speech for the full runtime and conversion code.

License

Apache 2.0.