Configuration Parsing Warning:Invalid JSON for config file config.json

YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Kokoro-82M CoreML

Pre-compiled CoreML models for Kokoro-82M text-to-speech, optimized for Apple Neural Engine.

Models

Model Max Tokens Max Audio Target
kokoro_24_10s 242 10.0s iOS 17+ / macOS 14+
kokoro_24_15s 242 15.0s iOS 17+ / macOS 14+
kokoro_21_5s 124 7.3s iOS 16+ / macOS 13+
kokoro_21_10s 168 10.6s iOS 16+ / macOS 13+
kokoro_21_15s 249 15.5s iOS 16+ / macOS 13+

Voices

50 preset voices across 10 languages: English (US/UK), Spanish, French, Hindi, Italian, Japanese, Portuguese, Chinese.

G2P (Grapheme-to-Phoneme)

Separate encoder-decoder CoreML models for neural G2P fallback on out-of-vocabulary words. Apache-2.0 licensed.

Model I/O

Inputs:

  • input_ids [1, N] Int32 β€” phoneme token IDs
  • attention_mask [1, N] Int32 β€” 1 for real tokens, 0 for padding
  • ref_s [1, 256] Float32 β€” voice style embedding
  • random_phases [1, 9] Float32 β€” random phases for iSTFTNet

Outputs:

  • audio [1, 1, S] Float32 β€” 24kHz waveform
  • audio_length_samples [1] Int32 β€” valid sample count
  • pred_dur [1, N] Float32 β€” predicted phoneme durations

Usage

import KokoroTTS

let tts = try await KokoroTTSModel.fromPretrained()
let audio = try tts.synthesize(text: "Hello world", voice: "af_heart")

License


Links

Downloads last month
1,320
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Collection including aufklarer/Kokoro-82M-CoreML