VibeVoice CoreMl
Collection
VibeVoice models (TTS/STT) converted to CoreML • 4 items • Updated
VibeVoice 1.5B (Qwen2.5-1.5B) — CoreML INT8, fused LM+head, fused diffusion loop, DPM-Solver++ 10-step. Multi-speaker TTS with voice cloning.
Add vibevoice-coreml to your Swift package. Models auto-download from this repo on first use.
import VibeVoiceCoreML
let tts = try await MultispeakerTTS(architecture: .model1_5B)
let voices = try await tts.encodeVoices(from: [referenceAudioURL])
for try await frame in tts.speak("Hello world", config: MultispeakerConfig(), voices: voices) {
// frame.samples: [Float] at 24kHz
}
See the GitHub repo for CLI usage, Python pipelines, and conversion scripts.
ct.StateType for stateful models).mlmodelc — no on-device compilation neededlm_decoder_fused_int8.mlmodelcdiffusion_loop.mlmodelcvae_decoder_streaming.mlmodelcsemantic_encoder_streaming.mlmodelcacoustic_connector.mlmodelcsemantic_connector.mlmodelcvae_encoder.mlmodelcembed_tokens.bintokenizer.jsontokenizer_config.jsonMIT (same as upstream VibeVoice models from Microsoft)