pocket-tts-coreml / README.md

alexwengg

Update README.md

673ca7d verified 15 days ago

preview code

raw

history blame contribute delete

1.34 kB

metadata

license: cc-by-4.0
library_name: coreml
tags:
  - tts
  - text-to-speech
  - coreml
  - apple
  - on-device
language:
  - en

PocketTTS CoreML

CoreML conversion of kyutai/pocket-tts for on-device inference on Apple platforms.

Models

Model	Description	Size
cond_step	KV cache prefill (voice + text conditioning)	~200MB
flowlm_step	Autoregressive generation (transformer_out + EOS)	~200MB
flow_decoder	Flow matching denoiser (8 Euler steps per frame)	~190MB
mimi_decoder	Streaming audio codec (1920 samples per frame)	~11MB

Voices

4 pre-encoded voices in constants_bin/:

alba (default), azelma, cosette, javert

Voice cloning weights are not included — they are gated separately by Kyutai.

Usage

import FluidAudioTTS

let manager = PocketTtsManager()
try await manager.initialize()
let audio = try await manager.synthesize(text: "Hello, world!")

See https://github.com/FluidInference/FluidAudio for the full Swift framework.

License

CC-BY-4.0, inherited from https://huggingface.co/kyutai/pocket-tts. Attribution to Kyutai is required.

References

- https://huggingface.co/kyutai/pocket-tts
- https://arxiv.org/abs/2410.00037
- https://github.com/FluidInference/FluidAudio