CoreML Speech Models
Collection
Speech AI models for Apple Neural Engine via CoreML. iOS/macOS ready. ASR, TTS, VAD, diarization. β’ 13 items β’ Updated
Configuration Parsing Warning:Invalid JSON for config file config.json
YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Pre-compiled CoreML models for Kokoro-82M text-to-speech, optimized for Apple Neural Engine.
| Model | Max Tokens | Max Audio | Target |
|---|---|---|---|
kokoro_24_10s |
242 | 10.0s | iOS 17+ / macOS 14+ |
kokoro_24_15s |
242 | 15.0s | iOS 17+ / macOS 14+ |
kokoro_21_5s |
124 | 7.3s | iOS 16+ / macOS 13+ |
kokoro_21_10s |
168 | 10.6s | iOS 16+ / macOS 13+ |
kokoro_21_15s |
249 | 15.5s | iOS 16+ / macOS 13+ |
50 preset voices across 10 languages: English (US/UK), Spanish, French, Hindi, Italian, Japanese, Portuguese, Chinese.
Separate encoder-decoder CoreML models for neural G2P fallback on out-of-vocabulary words. Apache-2.0 licensed.
Inputs:
input_ids [1, N] Int32 β phoneme token IDsattention_mask [1, N] Int32 β 1 for real tokens, 0 for paddingref_s [1, 256] Float32 β voice style embeddingrandom_phases [1, 9] Float32 β random phases for iSTFTNetOutputs:
audio [1, 1, S] Float32 β 24kHz waveformaudio_length_samples [1] Int32 β valid sample countpred_dur [1, N] Float32 β predicted phoneme durationsimport KokoroTTS
let tts = try await KokoroTTSModel.fromPretrained()
let audio = try tts.synthesize(text: "Hello world", voice: "af_heart")