Kokoro TTS CoreML – Runtime Assets

This repository contains the required runtime assets to run Kokoro TTS fully on-device using CoreML.

It includes phoneme resources, G2P models, POS tagging models, and CoreML TTS model bundles required for synthesis.

Reference implementation: https://github.com/philipdaquin/Kokoro-tts-coreml

📦 Directory Structure

original/ # Original reference files and conversion artifacts
EspeakData/ # eSpeak phoneme + dictionary resources
g2p/ # Grapheme-to-Phoneme models
POSModels/ # Part-of-speech tagging models
TTSModels/ # CoreML TTS models (.mlpackage)

🧠 Overview

Kokoro TTS CoreML runs using a multi-stage pipeline:

Text normalization
G2P conversion
POS tagging (context refinement)

All components required for fully offline speech synthesis are included here.

📂 Folder Details

EspeakData/

Contains phoneme definitions, language dictionaries, and pronunciation mappings used during G2P processing.

g2p/

Grapheme-to-Phoneme conversion models.
Converts normalized text into phoneme sequences before duration prediction.

POSModels/

Part-of-speech models used to refine pronunciation and contextual prosody.

TTSModels/

Contains the CoreML models used for synthesis:

Duration model
HAR decoder buckets
Vocoder variants
Feature / F0 variants

These .mlpackage bundles are optimized for Apple Silicon and ANE acceleration.

⚙️ Architecture

Kokoro CoreML uses a two-stage inference pipeline:

Stage 1 – Duration Model (CPU/GPU)

Variable-length text input
Transformer + LSTM layers
Outputs phoneme durations + intermediate features

Stage 2 – HAR Decoder (ANE Optimized)

Fixed-size synthesis buckets
iSTFTNet vocoder architecture
24kHz waveform output
~17× faster than real-time on supported devices

🚀 Requirements

iOS 17+ / macOS Sonoma+
Apple Silicon recommended
ANE-capable hardware for optimal performance
~200MB RAM per loaded model bucket

🔧 Integration Notes

Load models on-demand
Select synthesis bucket dynamically based on predicted duration
First inference will be slower (warm-up effect)
Unload unused models to conserve memory

📥 Usage

Clone the repository:

git clone https://huggingface.co/\<username>/<repo>

Or download via Hugging Face CLI:

huggingface-cli download / --local-dir .

📌 License

Refer to the original Kokoro TTS license and any included third-party licenses inside their respective folders.

Ensure attribution is preserved if redistributing.

🙏 Credits

Based on the Kokoro TTS CoreML conversion project:
https://github.com/philipdaquin/Kokoro-tts-coreml

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support