File size: 2,301 Bytes
f30fb77 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 | # StyleTTS2 → CoreML iteration_2
Production-ready fp32 mlpackages adopting Trials 4 + 6 + 8b from
`coreml/fusions.md`.
## Pipeline (8 stages, 8 dispatches)
```
text_encoder → CPU_ONLY fp32 21 MB
bert → ALL fp32 23 MB
ref_encoder → CPU_AND_GPU fp32 106 MB
fused_diffusion_sampler → ALL fp32 94 MB ← Trial 4 (replaces diffusion_unet × 8)
duration_predictor → CPU_ONLY fp32 30 MB
fused_f0n_har_source → CPU_ONLY fp32 32 MB ← Trial 6 (replaces f0n_predictor + har_source)
decoder_pre → CPU_AND_NE fp32 128 MB
decoder_upsample → CPU_ONLY fp32 79 MB
```
Total: **514 MB**, 8 mlpackages, 8 dispatches per utterance.
## Performance
Warm latency on M-series Mac, single-process, no other GPU/ANE workloads:
* Pipeline warm: **~480–565 ms** (down from ~1030 ms baseline)
* Stage count: 9 → 8 (Trials 4 + 6)
* Dispatches per utterance: 16 → 8 (−50%)
See `coreml/fusions.md` for full trial history, latency tables, parity
chains, and per-stage placement sweep results.
## Adopted trials
| Trial | Change | Save |
|-------|------------------------------------------------------|------|
| 4 | fused 5-step ADPM2 sampler (8 dispatches → 1) | −437 ms warm |
| 6 | fused f0n_predictor + har_source | −42 ms warm |
| 8b | bert→ALL, ref_encoder→CPU_AND_GPU, sampler→ALL | small but stable |
## Skipped / dropped
| Trial | Outcome |
|-------|------------------------------------------------------|
| 5 | har + decoder_upsample fuse — partition tax (+290 ms) |
| 7 | ref_encoder + sampler fuse — partition tax (200 MB graph) |
| 8a | aggressive `decoder_upsample → ALL` — bimodal 322–759 ms |
| 9 | `_hifigan_shift` fold — sub-1 ms saving, dominated by Trial 8 |
## Usage
Drop `packages/` into `models/tts/styletts2/coreml/` (or symlink) and
run `python -m coreml.inference` from the styletts2 root. The
`_STAGE_COMPUTE` and `_STAGE_PRECISION` manifests in
`coreml/inference.py` are wired to load these by default.
To compare against the legacy 9-package path:
```bash
python -m coreml.inference --no-fused
```
|