FluidInference
/

pocket-tts-coreml

Model card Files Files and versions

alexwengg commited on Feb 1

Commit

673ca7d

·

verified ·

1 Parent(s): 5be82db

Update README.md

Files changed (1) hide show

README.md +40 -39

README.md CHANGED Viewed

@@ -1,55 +1,56 @@
- ---
-  license: cc-by-4.0
-  library_name: coreml
-  tags:
-    - tts
-    - text-to-speech
-    - coreml
-    - apple
-    - on-device
-  language:
-    - en
-  ---
-  # PocketTTS CoreML
-  CoreML conversion of [kyutai/pocket-tts](https://huggingface.co/kyutai/pocket-tts) for on-device
-  inference on Apple platforms.
-  ## Models
-  | Model | Description | Size |
-  |-------|-------------|------|
-  | cond_step | KV cache prefill (voice + text conditioning) | ~200MB |
-  | flowlm_step | Autoregressive generation (transformer_out + EOS) | ~200MB |
-  | flow_decoder | Flow matching denoiser (8 Euler steps per frame) | ~190MB |
-  | mimi_decoder | Streaming audio codec (1920 samples per frame) | ~11MB |
-  ## Voices
-  4 pre-encoded voices in `constants_bin/`:
-  - `alba` (default), `azelma`, `cosette`, `javert`
-  Voice cloning weights are **not included** — they are gated separately by Kyutai.
-  ## Usage
-  ```swift
-  import FluidAudioTTS
-  let manager = PocketTtsManager()
-  try await manager.initialize()
-  let audio = try await manager.synthesize(text: "Hello, world!")
-  See https://github.com/FluidInference/FluidAudio for the full Swift framework.
-  License
-  CC-BY-4.0, inherited from https://huggingface.co/kyutai/pocket-tts. Attribution to Kyutai is required.
-  References
-  - https://huggingface.co/kyutai/pocket-tts
-  - https://arxiv.org/abs/2410.00037
-  - https://github.com/FluidInference/FluidAudio

+---
+license: cc-by-4.0
+library_name: coreml
+tags:
+  - tts
+  - text-to-speech
+  - coreml
+  - apple
+  - on-device
+language:
+  - en
+---
+# PocketTTS CoreML
+CoreML conversion of [kyutai/pocket-tts](https://huggingface.co/kyutai/pocket-tts) for on-device
+inference on Apple platforms.
+## Models
+| Model | Description | Size |
+|-------|-------------|------|
+| cond_step | KV cache prefill (voice + text conditioning) | ~200MB |
+| flowlm_step | Autoregressive generation (transformer_out + EOS) | ~200MB |
+| flow_decoder | Flow matching denoiser (8 Euler steps per frame) | ~190MB |
+| mimi_decoder | Streaming audio codec (1920 samples per frame) | ~11MB |
+## Voices
+4 pre-encoded voices in `constants_bin/`:
+- `alba` (default), `azelma`, `cosette`, `javert`
+Voice cloning weights are **not included** — they are gated separately by Kyutai.
+## Usage
+```swift
+import FluidAudioTTS
+let manager = PocketTtsManager()
+try await manager.initialize()
+let audio = try await manager.synthesize(text: "Hello, world!")
+See https://github.com/FluidInference/FluidAudio for the full Swift framework.
+License
+CC-BY-4.0, inherited from https://huggingface.co/kyutai/pocket-tts. Attribution to Kyutai is required.
+References
+- https://huggingface.co/kyutai/pocket-tts
+- https://arxiv.org/abs/2410.00037
+- https://github.com/FluidInference/FluidAudio