File size: 1,344 Bytes
5738f6a
 
673ca7d
 
 
 
 
 
 
 
 
 
 
 
5738f6a
673ca7d
5738f6a
673ca7d
 
5738f6a
673ca7d
5738f6a
673ca7d
 
 
 
 
 
5738f6a
673ca7d
5738f6a
673ca7d
 
5738f6a
673ca7d
5738f6a
673ca7d
5738f6a
673ca7d
 
5738f6a
673ca7d
 
 
5738f6a
673ca7d
5738f6a
673ca7d
5738f6a
673ca7d
5738f6a
673ca7d
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56


---
license: cc-by-4.0
library_name: coreml
tags:
  - tts
  - text-to-speech
  - coreml
  - apple
  - on-device
language:
  - en
---

# PocketTTS CoreML

CoreML conversion of [kyutai/pocket-tts](https://huggingface.co/kyutai/pocket-tts) for on-device
inference on Apple platforms.

## Models

| Model | Description | Size |
|-------|-------------|------|
| cond_step | KV cache prefill (voice + text conditioning) | ~200MB |
| flowlm_step | Autoregressive generation (transformer_out + EOS) | ~200MB |
| flow_decoder | Flow matching denoiser (8 Euler steps per frame) | ~190MB |
| mimi_decoder | Streaming audio codec (1920 samples per frame) | ~11MB |

## Voices

4 pre-encoded voices in `constants_bin/`:
- `alba` (default), `azelma`, `cosette`, `javert`

Voice cloning weights are **not included** — they are gated separately by Kyutai.

## Usage

```swift
import FluidAudioTTS

let manager = PocketTtsManager()
try await manager.initialize()
let audio = try await manager.synthesize(text: "Hello, world!")

See https://github.com/FluidInference/FluidAudio for the full Swift framework.

License

CC-BY-4.0, inherited from https://huggingface.co/kyutai/pocket-tts. Attribution to Kyutai is required.

References

- https://huggingface.co/kyutai/pocket-tts
- https://arxiv.org/abs/2410.00037
- https://github.com/FluidInference/FluidAudio