alexwengg commited on
Commit
673ca7d
·
verified ·
1 Parent(s): 5be82db

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +40 -39
README.md CHANGED
@@ -1,55 +1,56 @@
1
 
2
- ---
3
- license: cc-by-4.0
4
- library_name: coreml
5
- tags:
6
- - tts
7
- - text-to-speech
8
- - coreml
9
- - apple
10
- - on-device
11
- language:
12
- - en
13
- ---
14
 
15
- # PocketTTS CoreML
 
 
 
 
 
 
 
 
 
 
 
16
 
17
- CoreML conversion of [kyutai/pocket-tts](https://huggingface.co/kyutai/pocket-tts) for on-device
18
- inference on Apple platforms.
19
 
20
- ## Models
 
21
 
22
- | Model | Description | Size |
23
- |-------|-------------|------|
24
- | cond_step | KV cache prefill (voice + text conditioning) | ~200MB |
25
- | flowlm_step | Autoregressive generation (transformer_out + EOS) | ~200MB |
26
- | flow_decoder | Flow matching denoiser (8 Euler steps per frame) | ~190MB |
27
- | mimi_decoder | Streaming audio codec (1920 samples per frame) | ~11MB |
28
 
29
- ## Voices
 
 
 
 
 
30
 
31
- 4 pre-encoded voices in `constants_bin/`:
32
- - `alba` (default), `azelma`, `cosette`, `javert`
33
 
34
- Voice cloning weights are **not included** — they are gated separately by Kyutai.
 
35
 
36
- ## Usage
37
 
38
- ```swift
39
- import FluidAudioTTS
40
 
41
- let manager = PocketTtsManager()
42
- try await manager.initialize()
43
- let audio = try await manager.synthesize(text: "Hello, world!")
44
 
45
- See https://github.com/FluidInference/FluidAudio for the full Swift framework.
 
 
46
 
47
- License
48
 
49
- CC-BY-4.0, inherited from https://huggingface.co/kyutai/pocket-tts. Attribution to Kyutai is required.
50
 
51
- References
52
 
53
- - https://huggingface.co/kyutai/pocket-tts
54
- - https://arxiv.org/abs/2410.00037
55
- - https://github.com/FluidInference/FluidAudio
 
 
 
1
 
 
 
 
 
 
 
 
 
 
 
 
 
2
 
3
+ ---
4
+ license: cc-by-4.0
5
+ library_name: coreml
6
+ tags:
7
+ - tts
8
+ - text-to-speech
9
+ - coreml
10
+ - apple
11
+ - on-device
12
+ language:
13
+ - en
14
+ ---
15
 
16
+ # PocketTTS CoreML
 
17
 
18
+ CoreML conversion of [kyutai/pocket-tts](https://huggingface.co/kyutai/pocket-tts) for on-device
19
+ inference on Apple platforms.
20
 
21
+ ## Models
 
 
 
 
 
22
 
23
+ | Model | Description | Size |
24
+ |-------|-------------|------|
25
+ | cond_step | KV cache prefill (voice + text conditioning) | ~200MB |
26
+ | flowlm_step | Autoregressive generation (transformer_out + EOS) | ~200MB |
27
+ | flow_decoder | Flow matching denoiser (8 Euler steps per frame) | ~190MB |
28
+ | mimi_decoder | Streaming audio codec (1920 samples per frame) | ~11MB |
29
 
30
+ ## Voices
 
31
 
32
+ 4 pre-encoded voices in `constants_bin/`:
33
+ - `alba` (default), `azelma`, `cosette`, `javert`
34
 
35
+ Voice cloning weights are **not included** — they are gated separately by Kyutai.
36
 
37
+ ## Usage
 
38
 
39
+ ```swift
40
+ import FluidAudioTTS
 
41
 
42
+ let manager = PocketTtsManager()
43
+ try await manager.initialize()
44
+ let audio = try await manager.synthesize(text: "Hello, world!")
45
 
46
+ See https://github.com/FluidInference/FluidAudio for the full Swift framework.
47
 
48
+ License
49
 
50
+ CC-BY-4.0, inherited from https://huggingface.co/kyutai/pocket-tts. Attribution to Kyutai is required.
51
 
52
+ References
53
+
54
+ - https://huggingface.co/kyutai/pocket-tts
55
+ - https://arxiv.org/abs/2410.00037
56
+ - https://github.com/FluidInference/FluidAudio