Duplicated from FluidInference/pocket-tts-coreml

aoiandroid
/

pocket-tts-coreml

Model card Files Files and versions

pocket-tts-coreml / README.md

aoiandroid's picture

Duplicate from FluidInference/pocket-tts-coreml

11096e9 29 days ago

|

history blame contribute delete

1.42 kB

	---
	license: cc-by-4.0
	library_name: coreml
	tags:
	- tts
	- text-to-speech
	- coreml
	- apple
	- on-device
	language:
	- en
	pipeline_tag: text-to-speech
	base_model:
	- kyutai/pocket-tts
	base_model_relation: finetune
	---

	# PocketTTS CoreML

	CoreML conversion of [kyutai/pocket-tts](https://huggingface.co/kyutai/pocket-tts) for on-device
	inference on Apple platforms.

	## Models

	\| Model \| Description \| Size \|
	\|-------\|-------------\|------\|
	\| cond_step \| KV cache prefill (voice + text conditioning) \| ~200MB \|
	\| flowlm_step \| Autoregressive generation (transformer_out + EOS) \| ~200MB \|
	\| flow_decoder \| Flow matching denoiser (8 Euler steps per frame) \| ~190MB \|
	\| mimi_decoder \| Streaming audio codec (1920 samples per frame) \| ~11MB \|

	## Voices

	4 pre-encoded voices in `constants_bin/`:
	- `alba` (default), `azelma`, `cosette`, `javert`

	Voice cloning weights are not included — they are gated separately by Kyutai.

	## Usage

	```swift
	import FluidAudioTTS

	let manager = PocketTtsManager()
	try await manager.initialize()
	let audio = try await manager.synthesize(text: "Hello, world!")

	See https://github.com/FluidInference/FluidAudio for the full Swift framework.

	License

	CC-BY-4.0, inherited from https://huggingface.co/kyutai/pocket-tts. Attribution to Kyutai is required.

	References

	- https://huggingface.co/kyutai/pocket-tts
	- https://arxiv.org/abs/2410.00037
	- https://github.com/FluidInference/FluidAudio