YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

VibeVoice 1.5 CoreML

VibeVoice 1.5B text-to-speech model converted to Apple CoreML format for on-device inference on macOS and iOS.

Model Components

Component File Description
Acoustic Encoder vibevoice_acoustic_encoder.mlpackage Encodes audio to acoustic latent (fixed input: 24000 samples @ 24kHz)
Acoustic Decoder vibevoice_acoustic_decoder.mlpackage Decodes acoustic latent to audio
Semantic Encoder vibevoice_semantic_encoder.mlpackage Encodes audio to semantic latent (fixed input: 24000 samples)
Acoustic Connector vibevoice_acoustic_connector.mlpackage Projects acoustic latent to LLM hidden space
Semantic Connector vibevoice_semantic_connector.mlpackage Projects semantic latent to LLM hidden space
LLM vibevoice_llm.mlpackage Qwen2-1.5B-based language model
Diffusion Head vibevoice_diffusion_head.mlpackage Single-step diffusion denoising

Requirements

  • Platform: macOS 14.0+ or iOS 17.0+
  • Framework: CoreML (Apple Silicon or Intel with Neural Engine recommended)

Usage

Swift (iOS/macOS)

Use VibeVoicePipeline.swift with a directory containing all .mlpackage files:

let modelDir = URL(fileURLWithPath: "/path/to/models")
let pipeline = try VibeVoicePipeline(modelDirectory: modelDir)
// Encode, run LLM, diffusion, decode...

Python (macOS only)

CoreML models require macOS to load and run:

python inference.py --models-dir ./models --text "Hello world"

Configuration

  • Audio: 24 kHz, 1 channel; encoder input fixed at 24000 samples (1 second). Trim or pad input before encoding.
  • Diffusion: 20 steps, cosine beta schedule, v_prediction.
  • LLM: 1536 hidden size, 28 layers (Qwen2-1.5B-based).

See vibevoice_config.json and vibevoice_pipeline_config.json for full settings.

Conversion Notes

Conversion was done with coremltools 9.0. Acoustic and semantic encoders use fixed-length (24000 samples) inputs; the LLM was exported via torch.export + custom op registration. See CONVERSION_RESULTS.md for details.

License

Refer to the original VibeVoice model license.

Downloads last month
7
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support