| # Quantization Matrix (CoreML) | |
| This repository publishes only >=8-bit CoreML artifacts. 4-bit variants are | |
| excluded due to quality. | |
| ## Naming rules | |
| The folder name encodes the intended runtime and quantization approach: | |
| - `coreml_*`: generic CoreML export. | |
| - `coreml_ios18_*`: tuned for iOS 18 CoreML runtime. | |
| - `int8`: int8 weights for one or more stages. | |
| - `vocoder_only`: only the vocoder is quantized (per naming). | |
| - `both`: multiple stages are quantized (per naming). | |
| - `compressed` / `linear8`: linear 8-bit compression for smaller memory. | |
| ## Variant table | |
| | Variant folder | Quantization (by name) | Expected tradeoff | When to use | | |
| | --- | --- | --- | --- | | |
| | `coreml` | full precision (mixed) | best quality, larger | baseline quality checks | | |
| | `coreml_int8` | int8 (all stages) | faster, smaller | general fast inference | | |
| | `coreml_compressed` | linear8 | smallest memory | low-memory devices | | |
| | `coreml_ios18` | full precision (mlprogram) | best quality on iOS 18 | iOS 18+ devices | | |
| | `coreml_ios18_int8_vocoder_only` | int8 (vocoder only) | balanced | iOS 18+ with minimal quality loss | | |
| | `coreml_ios18_int8_both` | int8 (multiple stages) | faster, more loss | iOS 18+ when latency matters | | |
| | `coreml_compressed_ios18` | linear8 (subset) | smallest memory | iOS 18+ with tight memory | | |
| ## Steps vs. quality | |
| The `steps` parameter controls the denoiser iterations: | |
| - Fewer steps = faster, lower fidelity. | |
| - More steps = slower, higher fidelity. | |
| Recommended starting points: | |
| - **Fast preview:** 10 steps | |
| - **Balanced:** 20 steps | |
| - **Higher quality:** 30 steps | |
| ## Excluded variants | |
| The following are intentionally not published: | |
| - `coreml_ios18_int4_only` | |
| - `coreml_ios18_int4_int8` | |
| - any package with `int4` or `linear4` in its filename | |