alexwengg's picture
Upload 25 files
f30fb77 verified
|
Raw
History Blame Contribute Delete
2.3 kB
# StyleTTS2 → CoreML iteration_2
Production-ready fp32 mlpackages adopting Trials 4 + 6 + 8b from
`coreml/fusions.md`.
## Pipeline (8 stages, 8 dispatches)
```
text_encoder → CPU_ONLY fp32 21 MB
bert → ALL fp32 23 MB
ref_encoder → CPU_AND_GPU fp32 106 MB
fused_diffusion_sampler → ALL fp32 94 MB ← Trial 4 (replaces diffusion_unet × 8)
duration_predictor → CPU_ONLY fp32 30 MB
fused_f0n_har_source → CPU_ONLY fp32 32 MB ← Trial 6 (replaces f0n_predictor + har_source)
decoder_pre → CPU_AND_NE fp32 128 MB
decoder_upsample → CPU_ONLY fp32 79 MB
```
Total: **514 MB**, 8 mlpackages, 8 dispatches per utterance.
## Performance
Warm latency on M-series Mac, single-process, no other GPU/ANE workloads:
* Pipeline warm: **~480–565 ms** (down from ~1030 ms baseline)
* Stage count: 9 → 8 (Trials 4 + 6)
* Dispatches per utterance: 16 → 8 (−50%)
See `coreml/fusions.md` for full trial history, latency tables, parity
chains, and per-stage placement sweep results.
## Adopted trials
| Trial | Change | Save |
|-------|------------------------------------------------------|------|
| 4 | fused 5-step ADPM2 sampler (8 dispatches → 1) | −437 ms warm |
| 6 | fused f0n_predictor + har_source | −42 ms warm |
| 8b | bert→ALL, ref_encoder→CPU_AND_GPU, sampler→ALL | small but stable |
## Skipped / dropped
| Trial | Outcome |
|-------|------------------------------------------------------|
| 5 | har + decoder_upsample fuse — partition tax (+290 ms) |
| 7 | ref_encoder + sampler fuse — partition tax (200 MB graph) |
| 8a | aggressive `decoder_upsample → ALL` — bimodal 322–759 ms |
| 9 | `_hifigan_shift` fold — sub-1 ms saving, dominated by Trial 8 |
## Usage
Drop `packages/` into `models/tts/styletts2/coreml/` (or symlink) and
run `python -m coreml.inference` from the styletts2 root. The
`_STAGE_COMPUTE` and `_STAGE_PRECISION` manifests in
`coreml/inference.py` are wired to load these by default.
To compare against the legacy 9-package path:
```bash
python -m coreml.inference --no-fused
```