# VibeVoice 1.5 CoreML VibeVoice 1.5B text-to-speech model converted to Apple CoreML format for on-device inference on macOS and iOS. ## Model Components | Component | File | Description | |-----------|------|-------------| | Acoustic Encoder | `vibevoice_acoustic_encoder.mlpackage` | Encodes audio to acoustic latent (fixed input: 24000 samples @ 24kHz) | | Acoustic Decoder | `vibevoice_acoustic_decoder.mlpackage` | Decodes acoustic latent to audio | | Semantic Encoder | `vibevoice_semantic_encoder.mlpackage` | Encodes audio to semantic latent (fixed input: 24000 samples) | | Acoustic Connector | `vibevoice_acoustic_connector.mlpackage` | Projects acoustic latent to LLM hidden space | | Semantic Connector | `vibevoice_semantic_connector.mlpackage` | Projects semantic latent to LLM hidden space | | LLM | `vibevoice_llm.mlpackage` | Qwen2-1.5B-based language model | | Diffusion Head | `vibevoice_diffusion_head.mlpackage` | Single-step diffusion denoising | ## Requirements - **Platform**: macOS 14.0+ or iOS 17.0+ - **Framework**: CoreML (Apple Silicon or Intel with Neural Engine recommended) ## Usage ### Swift (iOS/macOS) Use `VibeVoicePipeline.swift` with a directory containing all `.mlpackage` files: ```swift let modelDir = URL(fileURLWithPath: "/path/to/models") let pipeline = try VibeVoicePipeline(modelDirectory: modelDir) // Encode, run LLM, diffusion, decode... ``` ### Python (macOS only) CoreML models require macOS to load and run: ```bash python inference.py --models-dir ./models --text "Hello world" ``` ## Configuration - **Audio**: 24 kHz, 1 channel; encoder input fixed at 24000 samples (1 second). Trim or pad input before encoding. - **Diffusion**: 20 steps, cosine beta schedule, v_prediction. - **LLM**: 1536 hidden size, 28 layers (Qwen2-1.5B-based). See `vibevoice_config.json` and `vibevoice_pipeline_config.json` for full settings. ## Conversion Notes Conversion was done with coremltools 9.0. Acoustic and semantic encoders use fixed-length (24000 samples) inputs; the LLM was exported via `torch.export` + custom op registration. See `CONVERSION_RESULTS.md` for details. ## License Refer to the original VibeVoice model license.