alexwengg commited on
Commit
5738f6a
·
verified ·
1 Parent(s): 1039ebf

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +55 -0
README.md ADDED
@@ -0,0 +1,55 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ ⏺ ---
3
+ license: cc-by-4.0
4
+ library_name: coreml
5
+ tags:
6
+ - tts
7
+ - text-to-speech
8
+ - coreml
9
+ - apple
10
+ - on-device
11
+ language:
12
+ - en
13
+ ---
14
+
15
+ # PocketTTS CoreML
16
+
17
+ CoreML conversion of [kyutai/pocket-tts](https://huggingface.co/kyutai/pocket-tts) for on-device
18
+ inference on Apple platforms.
19
+
20
+ ## Models
21
+
22
+ | Model | Description | Size |
23
+ |-------|-------------|------|
24
+ | cond_step | KV cache prefill (voice + text conditioning) | ~200MB |
25
+ | flowlm_step | Autoregressive generation (transformer_out + EOS) | ~200MB |
26
+ | flow_decoder | Flow matching denoiser (8 Euler steps per frame) | ~190MB |
27
+ | mimi_decoder | Streaming audio codec (1920 samples per frame) | ~11MB |
28
+
29
+ ## Voices
30
+
31
+ 4 pre-encoded voices in `constants_bin/`:
32
+ - `alba` (default), `azelma`, `cosette`, `javert`
33
+
34
+ Voice cloning weights are **not included** — they are gated separately by Kyutai.
35
+
36
+ ## Usage
37
+
38
+ ```swift
39
+ import FluidAudioTTS
40
+
41
+ let manager = PocketTtsManager()
42
+ try await manager.initialize()
43
+ let audio = try await manager.synthesize(text: "Hello, world!")
44
+
45
+ See https://github.com/FluidInference/FluidAudio for the full Swift framework.
46
+
47
+ License
48
+
49
+ CC-BY-4.0, inherited from https://huggingface.co/kyutai/pocket-tts. Attribution to Kyutai is required.
50
+
51
+ References
52
+
53
+ - https://huggingface.co/kyutai/pocket-tts
54
+ - https://arxiv.org/abs/2410.00037
55
+ - https://github.com/FluidInference/FluidAudio