FluidInference
/

pocket-tts-coreml

Model card Files Files and versions

alexwengg commited on Feb 1

Commit

5738f6a

·

verified ·

1 Parent(s): 1039ebf

Create README.md

Files changed (1) hide show

README.md +55 -0

README.md ADDED Viewed

	@@ -0,0 +1,55 @@

+⏺ ---
+  license: cc-by-4.0
+  library_name: coreml
+  tags:
+    - tts
+    - text-to-speech
+    - coreml
+    - apple
+    - on-device
+  language:
+    - en
+  ---
+  # PocketTTS CoreML
+  CoreML conversion of [kyutai/pocket-tts](https://huggingface.co/kyutai/pocket-tts) for on-device
+  inference on Apple platforms.
+  ## Models
+  | Model | Description | Size |
+  |-------|-------------|------|
+  | cond_step | KV cache prefill (voice + text conditioning) | ~200MB |
+  | flowlm_step | Autoregressive generation (transformer_out + EOS) | ~200MB |
+  | flow_decoder | Flow matching denoiser (8 Euler steps per frame) | ~190MB |
+  | mimi_decoder | Streaming audio codec (1920 samples per frame) | ~11MB |
+  ## Voices
+  4 pre-encoded voices in `constants_bin/`:
+  - `alba` (default), `azelma`, `cosette`, `javert`
+  Voice cloning weights are **not included** — they are gated separately by Kyutai.
+  ## Usage
+  ```swift
+  import FluidAudioTTS
+  let manager = PocketTtsManager()
+  try await manager.initialize()
+  let audio = try await manager.synthesize(text: "Hello, world!")
+  See https://github.com/FluidInference/FluidAudio for the full Swift framework.
+  License
+  CC-BY-4.0, inherited from https://huggingface.co/kyutai/pocket-tts. Attribution to Kyutai is required.
+  References
+  - https://huggingface.co/kyutai/pocket-tts
+  - https://arxiv.org/abs/2410.00037
+  - https://github.com/FluidInference/FluidAudio