gafiatulin
/

vibevoice-tts-1.5b-coreml

Model card Files Files and versions

gafiatulin / vibevoice-tts-1.5b-coreml

VibeVoice 1.5B (Qwen2.5-1.5B) — CoreML INT8, fused LM+head, fused diffusion loop, DPM-Solver++ 10-step. Multi-speaker TTS with voice cloning.

Usage

Add vibevoice-coreml to your Swift package. Models auto-download from this repo on first use.

import VibeVoiceCoreML

let tts = try await MultispeakerTTS(architecture: .model1_5B)
let voices = try await tts.encodeVoices(from: [referenceAudioURL])
for try await frame in tts.speak("Hello world", config: MultispeakerConfig(), voices: voices) {
    // frame.samples: [Float] at 24kHz
}

See the GitHub repo for CLI usage, Python pipelines, and conversion scripts.

Requirements

macOS 15+ (requires ct.StateType for stateful models)
Pre-compiled .mlmodelc — no on-device compilation needed

Files

Models

lm_decoder_fused_int8.mlmodelc
diffusion_loop.mlmodelc
vae_decoder_streaming.mlmodelc
semantic_encoder_streaming.mlmodelc
acoustic_connector.mlmodelc
semantic_connector.mlmodelc
vae_encoder.mlmodelc

Data

embed_tokens.bin

Extras

tokenizer.json
tokenizer_config.json

License

MIT (same as upstream VibeVoice models from Microsoft)

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including gafiatulin/vibevoice-tts-1.5b-coreml

VibeVoice CoreMl

VibeVoice models (TTS/STT) converted to CoreML • 4 items • Updated Mar 20