FluidInference
/

StyleTTS-2-coreml

Model card Files Files and versions

StyleTTS-2-coreml / iteration_2 /README.md

alexwengg's picture

Upload 25 files

f30fb77 verified about 2 months ago

|

History Blame Contribute Delete

2.3 kB

	# StyleTTS2 → CoreML iteration_2

	Production-ready fp32 mlpackages adopting Trials 4 + 6 + 8b from
	`coreml/fusions.md`.

	## Pipeline (8 stages, 8 dispatches)

	```
	text_encoder → CPU_ONLY fp32 21 MB
	bert → ALL fp32 23 MB
	ref_encoder → CPU_AND_GPU fp32 106 MB
	fused_diffusion_sampler → ALL fp32 94 MB ← Trial 4 (replaces diffusion_unet × 8)
	duration_predictor → CPU_ONLY fp32 30 MB
	fused_f0n_har_source → CPU_ONLY fp32 32 MB ← Trial 6 (replaces f0n_predictor + har_source)
	decoder_pre → CPU_AND_NE fp32 128 MB
	decoder_upsample → CPU_ONLY fp32 79 MB
	```

	Total: 514 MB, 8 mlpackages, 8 dispatches per utterance.

	## Performance

	Warm latency on M-series Mac, single-process, no other GPU/ANE workloads:

	* Pipeline warm: ~480–565 ms (down from ~1030 ms baseline)
	* Stage count: 9 → 8 (Trials 4 + 6)
	* Dispatches per utterance: 16 → 8 (−50%)

	See `coreml/fusions.md` for full trial history, latency tables, parity
	chains, and per-stage placement sweep results.

	## Adopted trials

	\| Trial \| Change \| Save \|
	\|-------\|------------------------------------------------------\|------\|
	\| 4 \| fused 5-step ADPM2 sampler (8 dispatches → 1) \| −437 ms warm \|
	\| 6 \| fused f0n_predictor + har_source \| −42 ms warm \|
	\| 8b \| bert→ALL, ref_encoder→CPU_AND_GPU, sampler→ALL \| small but stable \|

	## Skipped / dropped

	\| Trial \| Outcome \|
	\|-------\|------------------------------------------------------\|
	\| 5 \| har + decoder_upsample fuse — partition tax (+290 ms) \|
	\| 7 \| ref_encoder + sampler fuse — partition tax (200 MB graph) \|
	\| 8a \| aggressive `decoder_upsample → ALL` — bimodal 322–759 ms \|
	\| 9 \| `_hifigan_shift` fold — sub-1 ms saving, dominated by Trial 8 \|

	## Usage

	Drop `packages/` into `models/tts/styletts2/coreml/` (or symlink) and
	run `python -m coreml.inference` from the styletts2 root. The
	`_STAGE_COMPUTE` and `_STAGE_PRECISION` manifests in
	`coreml/inference.py` are wired to load these by default.

	To compare against the legacy 9-package path:

	```bash
	python -m coreml.inference --no-fused
	```