VibeVoice CoreMl
Collection
VibeVoice models (TTS/STT) converted to CoreML • 4 items • Updated
VibeVoice ASR (Qwen2-7B, 8.3B params) — CoreML INT8, fused LM+head, fused encoder, fused projector. 50+ languages, 60-minute single-pass transcription.
Add vibevoice-coreml to your Swift package. Models auto-download from this repo on first use.
import VibeVoiceCoreML
let stt = try await SpeechToText()
let result = try await stt.transcribe(audioURL)
print(result.text)
See the GitHub repo for CLI usage, Python pipelines, and conversion scripts.
ct.StateType for stateful models).mlmodelc — no on-device compilation neededfused_encoder.mlmodelcfused_projector.mlmodelclm_decoder_fused_int8.mlmodelcembed_tokens.bintokenizer.jsontokenizer_config.jsonMIT (same as upstream VibeVoice models from Microsoft)