YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Kokoro TTS CoreML β Runtime Assets
This repository contains the required runtime assets to run Kokoro TTS fully on-device using CoreML.
It includes phoneme resources, G2P models, POS tagging models, and CoreML TTS model bundles required for synthesis.
Reference implementation: https://github.com/philipdaquin/Kokoro-tts-coreml
π¦ Directory Structure
original/ # Original reference files and conversion artifacts
EspeakData/ # eSpeak phoneme + dictionary resources
g2p/ # Grapheme-to-Phoneme models
POSModels/ # Part-of-speech tagging models
TTSModels/ # CoreML TTS models (.mlpackage)
π§ Overview
Kokoro TTS CoreML runs using a multi-stage pipeline:
- Text normalization
- G2P conversion
- POS tagging (context refinement)
All components required for fully offline speech synthesis are included here.
π Folder Details
EspeakData/
Contains phoneme definitions, language dictionaries, and pronunciation mappings used during G2P processing.
g2p/
Grapheme-to-Phoneme conversion models.
Converts normalized text into phoneme sequences before duration prediction.
POSModels/
Part-of-speech models used to refine pronunciation and contextual prosody.
TTSModels/
Contains the CoreML models used for synthesis:
- Duration model
- HAR decoder buckets
- Vocoder variants
- Feature / F0 variants
These .mlpackage bundles are optimized for Apple Silicon and ANE acceleration.
βοΈ Architecture
Kokoro CoreML uses a two-stage inference pipeline:
Stage 1 β Duration Model (CPU/GPU)
- Variable-length text input
- Transformer + LSTM layers
- Outputs phoneme durations + intermediate features
Stage 2 β HAR Decoder (ANE Optimized)
- Fixed-size synthesis buckets
- iSTFTNet vocoder architecture
- 24kHz waveform output
- ~17Γ faster than real-time on supported devices
π Requirements
- iOS 17+ / macOS Sonoma+
- Apple Silicon recommended
- ANE-capable hardware for optimal performance
- ~200MB RAM per loaded model bucket
π§ Integration Notes
- Load models on-demand
- Select synthesis bucket dynamically based on predicted duration
- First inference will be slower (warm-up effect)
- Unload unused models to conserve memory
π₯ Usage
Clone the repository:
git clone https://huggingface.co/\<username>/<repo>
Or download via Hugging Face CLI:
huggingface-cli download / --local-dir .
π License
Refer to the original Kokoro TTS license and any included third-party licenses inside their respective folders.
Ensure attribution is preserved if redistributing.
π Credits
Based on the Kokoro TTS CoreML conversion project:
https://github.com/philipdaquin/Kokoro-tts-coreml