Kokoro-82M-Swift

Converted model weights for Kokoro-82M optimized for Swift inference on Apple Silicon.

Use with the kokoro-swift Swift package.

Formats

Directory	Format	Backend	Notes
`MLX_GPU/`	safetensors + npy	MLX-Swift (GPU)	Primary inference path via Metal
`CoreML_ANE/segmented/`	mlpackage × 4	CoreML (ANE + CPU)	Segmented for optimal Neural Engine utilization

Files

config.json                              # Model config (vocab, architecture)
MLX_GPU/
  kokoro-v1_0.safetensors               # MLX model weights (~310MB)
  voices/                                # Voice style packs (.npy, 54 voices)
    af_heart.npy, af_bella.npy, ...
CoreML_ANE/segmented/
  albert.mlpackage                       # ALBERT encoder (ANE)
  decoder.mlpackage                      # Vocoder/decoder (ANE)
  prosody.mlpackage                      # Prosody predictor (CPU)
  text_encoder.mlpackage                 # Text encoder (CPU)

Voices

54 voice packs covering multiple languages and styles. Voice names follow the pattern {lang}{gender}_{name}:

af_* — American Female, am_* — American Male
bf_* — British Female, bm_* — British Male
ef_* / em_* — Spanish, ff_* — French, jf_* / jm_* — Japanese, etc.

Usage with kokoro-swift

import Kokoro

// Download a voice on demand
let voiceURL = try await VoiceDownloader.download(voice: "af_heart")

// Or use the CLI
// KokoroCLI --text "Hello world" --voice af_heart --output hello.wav --weights-dir MLX_GPU

See kokoro-swift for full documentation.

Source

Converted from hexgrad/Kokoro-82M using the conversion scripts in kokoro-swift.

Downloads last month: 536

MLX

Hardware compatibility

Quantized

Model tree for mweinbach/Kokoro-82M-Swift

Base model

yl4579/StyleTTS2-LJSpeech

Finetuned

hexgrad/Kokoro-82M

Quantized

(50)

this model