supertonic-2-coreml / docs /quant-matrix.md
Nooder's picture
Initial CoreML bundle
9c0c050
# Quantization Matrix (CoreML)
This repository publishes only >=8-bit CoreML artifacts. 4-bit variants are
excluded due to quality.
## Naming rules
The folder name encodes the intended runtime and quantization approach:
- `coreml_*`: generic CoreML export.
- `coreml_ios18_*`: tuned for iOS 18 CoreML runtime.
- `int8`: int8 weights for one or more stages.
- `vocoder_only`: only the vocoder is quantized (per naming).
- `both`: multiple stages are quantized (per naming).
- `compressed` / `linear8`: linear 8-bit compression for smaller memory.
## Variant table
| Variant folder | Quantization (by name) | Expected tradeoff | When to use |
| --- | --- | --- | --- |
| `coreml` | full precision (mixed) | best quality, larger | baseline quality checks |
| `coreml_int8` | int8 (all stages) | faster, smaller | general fast inference |
| `coreml_compressed` | linear8 | smallest memory | low-memory devices |
| `coreml_ios18` | full precision (mlprogram) | best quality on iOS 18 | iOS 18+ devices |
| `coreml_ios18_int8_vocoder_only` | int8 (vocoder only) | balanced | iOS 18+ with minimal quality loss |
| `coreml_ios18_int8_both` | int8 (multiple stages) | faster, more loss | iOS 18+ when latency matters |
| `coreml_compressed_ios18` | linear8 (subset) | smallest memory | iOS 18+ with tight memory |
## Steps vs. quality
The `steps` parameter controls the denoiser iterations:
- Fewer steps = faster, lower fidelity.
- More steps = slower, higher fidelity.
Recommended starting points:
- **Fast preview:** 10 steps
- **Balanced:** 20 steps
- **Higher quality:** 30 steps
## Excluded variants
The following are intentionally not published:
- `coreml_ios18_int4_only`
- `coreml_ios18_int4_int8`
- any package with `int4` or `linear4` in its filename