---
license: cc-by-4.0
library_name: coreml
tags:
  - tts
  - text-to-speech
  - coreml
  - apple
  - on-device
language:
  - en
---

# PocketTTS CoreML

CoreML conversion of [kyutai/pocket-tts](https://huggingface.co/kyutai/pocket-tts) for on-device
inference on Apple platforms.

## Models

| Model | Description | Size |
|-------|-------------|------|
| cond_step | KV cache prefill (voice + text conditioning) | ~200MB |
| flowlm_step | Autoregressive generation (transformer_out + EOS) | ~200MB |
| flow_decoder | Flow matching denoiser (8 Euler steps per frame) | ~190MB |
| mimi_decoder | Streaming audio codec (1920 samples per frame) | ~11MB |

## Voices

4 pre-encoded voices in `constants_bin/`:
- `alba` (default), `azelma`, `cosette`, `javert`

Voice cloning weights are **not included** — they are gated separately by Kyutai.

## Usage

```swift
import FluidAudioTTS

let manager = PocketTtsManager()
try await manager.initialize()
let audio = try await manager.synthesize(text: "Hello, world!")

See https://github.com/FluidInference/FluidAudio for the full Swift framework.

License

CC-BY-4.0, inherited from https://huggingface.co/kyutai/pocket-tts. Attribution to Kyutai is required.

References

- https://huggingface.co/kyutai/pocket-tts
- https://arxiv.org/abs/2410.00037
- https://github.com/FluidInference/FluidAudio