VibeVoice-Realtime-0.5B CoreML
CoreML conversion of microsoft/VibeVoice-Realtime-0.5B for iOS 18+.
Model Components
| Model | Size | Quantized |
|---|---|---|
| TTS LLM (Stateful) | 829 MB | 208 MB |
| Diffusion Head | 80 MB | 20 MB |
| Acoustic Decoder | 656 MB | 164 MB |
Total: 1,565 MB (original) / 392 MB (quantized 4-bit)
Requirements
- iOS 18+ (Stateful Models for TTS LLM)
- CoreML framework
Files
vibevoice_tts_llm_quantized.mlpackage- TTS LLM (recommended)vibevoice_diffusion_head_quantized.mlpackage- Diffusion Headvibevoice_acoustic_decoder_quantized.mlpackage- Acoustic Decoderconversion_report.md- Full conversion detailsbenchmark_results.json- Performance metrics
Inference Pipeline
- Tokenize text (Qwen2 tokenizer)
- TTS LLM -> hidden states
- Diffusion Head (20 steps) -> acoustic latents
- Acoustic Decoder -> audio waveform (24kHz)
Source
Converted from microsoft/VibeVoice-Realtime-0.5B.
- Downloads last month
- 14