File size: 1,344 Bytes
5738f6a 673ca7d 5738f6a 673ca7d 5738f6a 673ca7d 5738f6a 673ca7d 5738f6a 673ca7d 5738f6a 673ca7d 5738f6a 673ca7d 5738f6a 673ca7d 5738f6a 673ca7d 5738f6a 673ca7d 5738f6a 673ca7d 5738f6a 673ca7d 5738f6a 673ca7d 5738f6a 673ca7d 5738f6a 673ca7d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 |
---
license: cc-by-4.0
library_name: coreml
tags:
- tts
- text-to-speech
- coreml
- apple
- on-device
language:
- en
---
# PocketTTS CoreML
CoreML conversion of [kyutai/pocket-tts](https://huggingface.co/kyutai/pocket-tts) for on-device
inference on Apple platforms.
## Models
| Model | Description | Size |
|-------|-------------|------|
| cond_step | KV cache prefill (voice + text conditioning) | ~200MB |
| flowlm_step | Autoregressive generation (transformer_out + EOS) | ~200MB |
| flow_decoder | Flow matching denoiser (8 Euler steps per frame) | ~190MB |
| mimi_decoder | Streaming audio codec (1920 samples per frame) | ~11MB |
## Voices
4 pre-encoded voices in `constants_bin/`:
- `alba` (default), `azelma`, `cosette`, `javert`
Voice cloning weights are **not included** — they are gated separately by Kyutai.
## Usage
```swift
import FluidAudioTTS
let manager = PocketTtsManager()
try await manager.initialize()
let audio = try await manager.synthesize(text: "Hello, world!")
See https://github.com/FluidInference/FluidAudio for the full Swift framework.
License
CC-BY-4.0, inherited from https://huggingface.co/kyutai/pocket-tts. Attribution to Kyutai is required.
References
- https://huggingface.co/kyutai/pocket-tts
- https://arxiv.org/abs/2410.00037
- https://github.com/FluidInference/FluidAudio |