pocket-tts-coreml / README.md
alexwengg's picture
Update README.md
673ca7d verified
metadata
license: cc-by-4.0
library_name: coreml
tags:
  - tts
  - text-to-speech
  - coreml
  - apple
  - on-device
language:
  - en

PocketTTS CoreML

CoreML conversion of kyutai/pocket-tts for on-device inference on Apple platforms.

Models

Model Description Size
cond_step KV cache prefill (voice + text conditioning) ~200MB
flowlm_step Autoregressive generation (transformer_out + EOS) ~200MB
flow_decoder Flow matching denoiser (8 Euler steps per frame) ~190MB
mimi_decoder Streaming audio codec (1920 samples per frame) ~11MB

Voices

4 pre-encoded voices in constants_bin/:

  • alba (default), azelma, cosette, javert

Voice cloning weights are not included — they are gated separately by Kyutai.

Usage

import FluidAudioTTS

let manager = PocketTtsManager()
try await manager.initialize()
let audio = try await manager.synthesize(text: "Hello, world!")

See https://github.com/FluidInference/FluidAudio for the full Swift framework.

License

CC-BY-4.0, inherited from https://huggingface.co/kyutai/pocket-tts. Attribution to Kyutai is required.

References

- https://huggingface.co/kyutai/pocket-tts
- https://arxiv.org/abs/2410.00037
- https://github.com/FluidInference/FluidAudio