--- language: - zh - en license: apache-2.0 library_name: mlx pipeline_tag: text-to-speech tags: - mlx - tts - speech - voice-conditioned - long-form - diffusion - apple-silicon - quantized - 8bit --- # VibeVoice — MLX VibeVoice converted and quantized for native MLX inference on Apple Silicon. A hybrid LLM + diffusion architecture built for long-form speech and voice-conditioned generation. Works in greedy or sampled mode, and produces natural-sounding output at scale. ## Variants | Path | Precision | | --- | --- | | `mlx-int8/` | int8 quantized weights | ## How to Get Started Via [mlx-speech](https://github.com/appautomaton/mlx-speech): ```bash python scripts/generate_vibevoice.py \ --text "Hello from VibeVoice." \ --output outputs/vibevoice.wav ``` ```python from mlx_speech.generation import VibeVoiceModel model = VibeVoiceModel.from_path("mlx-int8") ``` ## Model Details VibeVoice uses a 9B-parameter hybrid architecture combining a Qwen2 language model backbone with a continuous diffusion acoustic decoder. Converted to MLX with explicit weight remapping — no PyTorch at inference time. See [mlx-speech](https://github.com/appautomaton/mlx-speech) for the full runtime and conversion code. ## License Apache 2.0.