YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)
VibeVoice 1.5 CoreML
VibeVoice 1.5B text-to-speech model converted to Apple CoreML format for on-device inference on macOS and iOS.
Model Components
| Component | File | Description |
|---|---|---|
| Acoustic Encoder | vibevoice_acoustic_encoder.mlpackage |
Encodes audio to acoustic latent (fixed input: 24000 samples @ 24kHz) |
| Acoustic Decoder | vibevoice_acoustic_decoder.mlpackage |
Decodes acoustic latent to audio |
| Semantic Encoder | vibevoice_semantic_encoder.mlpackage |
Encodes audio to semantic latent (fixed input: 24000 samples) |
| Acoustic Connector | vibevoice_acoustic_connector.mlpackage |
Projects acoustic latent to LLM hidden space |
| Semantic Connector | vibevoice_semantic_connector.mlpackage |
Projects semantic latent to LLM hidden space |
| LLM | vibevoice_llm.mlpackage |
Qwen2-1.5B-based language model |
| Diffusion Head | vibevoice_diffusion_head.mlpackage |
Single-step diffusion denoising |
Requirements
- Platform: macOS 14.0+ or iOS 17.0+
- Framework: CoreML (Apple Silicon or Intel with Neural Engine recommended)
Usage
Swift (iOS/macOS)
Use VibeVoicePipeline.swift with a directory containing all .mlpackage files:
let modelDir = URL(fileURLWithPath: "/path/to/models")
let pipeline = try VibeVoicePipeline(modelDirectory: modelDir)
// Encode, run LLM, diffusion, decode...
Python (macOS only)
CoreML models require macOS to load and run:
python inference.py --models-dir ./models --text "Hello world"
Configuration
- Audio: 24 kHz, 1 channel; encoder input fixed at 24000 samples (1 second). Trim or pad input before encoding.
- Diffusion: 20 steps, cosine beta schedule, v_prediction.
- LLM: 1536 hidden size, 28 layers (Qwen2-1.5B-based).
See vibevoice_config.json and vibevoice_pipeline_config.json for full settings.
Conversion Notes
Conversion was done with coremltools 9.0. Acoustic and semantic encoders use fixed-length (24000 samples) inputs; the LLM was exported via torch.export + custom op registration. See CONVERSION_RESULTS.md for details.
License
Refer to the original VibeVoice model license.
- Downloads last month
- 7
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support