--- license: cc-by-4.0 library_name: coreml tags: - tts - text-to-speech - coreml - apple - on-device language: - en --- # PocketTTS CoreML CoreML conversion of [kyutai/pocket-tts](https://huggingface.co/kyutai/pocket-tts) for on-device inference on Apple platforms. ## Models | Model | Description | Size | |-------|-------------|------| | cond_step | KV cache prefill (voice + text conditioning) | ~200MB | | flowlm_step | Autoregressive generation (transformer_out + EOS) | ~200MB | | flow_decoder | Flow matching denoiser (8 Euler steps per frame) | ~190MB | | mimi_decoder | Streaming audio codec (1920 samples per frame) | ~11MB | ## Voices 4 pre-encoded voices in `constants_bin/`: - `alba` (default), `azelma`, `cosette`, `javert` Voice cloning weights are **not included** — they are gated separately by Kyutai. ## Usage ```swift import FluidAudioTTS let manager = PocketTtsManager() try await manager.initialize() let audio = try await manager.synthesize(text: "Hello, world!") See https://github.com/FluidInference/FluidAudio for the full Swift framework. License CC-BY-4.0, inherited from https://huggingface.co/kyutai/pocket-tts. Attribution to Kyutai is required. References - https://huggingface.co/kyutai/pocket-tts - https://arxiv.org/abs/2410.00037 - https://github.com/FluidInference/FluidAudio