# Quantization Matrix (CoreML) This repository publishes only >=8-bit CoreML artifacts. 4-bit variants are excluded due to quality. ## Naming rules The folder name encodes the intended runtime and quantization approach: - `coreml_*`: generic CoreML export. - `coreml_ios18_*`: tuned for iOS 18 CoreML runtime. - `int8`: int8 weights for one or more stages. - `vocoder_only`: only the vocoder is quantized (per naming). - `both`: multiple stages are quantized (per naming). - `compressed` / `linear8`: linear 8-bit compression for smaller memory. ## Variant table | Variant folder | Quantization (by name) | Expected tradeoff | When to use | | --- | --- | --- | --- | | `coreml` | full precision (mixed) | best quality, larger | baseline quality checks | | `coreml_int8` | int8 (all stages) | faster, smaller | general fast inference | | `coreml_compressed` | linear8 | smallest memory | low-memory devices | | `coreml_ios18` | full precision (mlprogram) | best quality on iOS 18 | iOS 18+ devices | | `coreml_ios18_int8_vocoder_only` | int8 (vocoder only) | balanced | iOS 18+ with minimal quality loss | | `coreml_ios18_int8_both` | int8 (multiple stages) | faster, more loss | iOS 18+ when latency matters | | `coreml_compressed_ios18` | linear8 (subset) | smallest memory | iOS 18+ with tight memory | ## Steps vs. quality The `steps` parameter controls the denoiser iterations: - Fewer steps = faster, lower fidelity. - More steps = slower, higher fidelity. Recommended starting points: - **Fast preview:** 10 steps - **Balanced:** 20 steps - **Higher quality:** 30 steps ## Excluded variants The following are intentionally not published: - `coreml_ios18_int4_only` - `coreml_ios18_int4_int8` - any package with `int4` or `linear4` in its filename