gafiatulin / vibevoice-asr-coreml

VibeVoice ASR (Qwen2-7B, 8.3B params) — CoreML INT8, fused LM+head, fused encoder, fused projector. 50+ languages, 60-minute single-pass transcription.

Usage

Add vibevoice-coreml to your Swift package. Models auto-download from this repo on first use.

import VibeVoiceCoreML

let stt = try await SpeechToText()
let result = try await stt.transcribe(audioURL)
print(result.text)

See the GitHub repo for CLI usage, Python pipelines, and conversion scripts.

Requirements

macOS 15+ (requires ct.StateType for stateful models)
Pre-compiled .mlmodelc — no on-device compilation needed

Files

Models

fused_encoder.mlmodelc
fused_projector.mlmodelc
lm_decoder_fused_int8.mlmodelc

Data

embed_tokens.bin

Extras

tokenizer.json
tokenizer_config.json

License

MIT (same as upstream VibeVoice models from Microsoft)

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including gafiatulin/vibevoice-asr-coreml

VibeVoice CoreMl

Collection

VibeVoice models (TTS/STT) converted to CoreML • 4 items • Updated Mar 20