supertonic-2-coreml / docs /quant-matrix.md
Nooder's picture
Initial CoreML bundle
9c0c050

Quantization Matrix (CoreML)

This repository publishes only >=8-bit CoreML artifacts. 4-bit variants are excluded due to quality.

Naming rules

The folder name encodes the intended runtime and quantization approach:

  • coreml_*: generic CoreML export.
  • coreml_ios18_*: tuned for iOS 18 CoreML runtime.
  • int8: int8 weights for one or more stages.
  • vocoder_only: only the vocoder is quantized (per naming).
  • both: multiple stages are quantized (per naming).
  • compressed / linear8: linear 8-bit compression for smaller memory.

Variant table

Variant folder Quantization (by name) Expected tradeoff When to use
coreml full precision (mixed) best quality, larger baseline quality checks
coreml_int8 int8 (all stages) faster, smaller general fast inference
coreml_compressed linear8 smallest memory low-memory devices
coreml_ios18 full precision (mlprogram) best quality on iOS 18 iOS 18+ devices
coreml_ios18_int8_vocoder_only int8 (vocoder only) balanced iOS 18+ with minimal quality loss
coreml_ios18_int8_both int8 (multiple stages) faster, more loss iOS 18+ when latency matters
coreml_compressed_ios18 linear8 (subset) smallest memory iOS 18+ with tight memory

Steps vs. quality

The steps parameter controls the denoiser iterations:

  • Fewer steps = faster, lower fidelity.
  • More steps = slower, higher fidelity.

Recommended starting points:

  • Fast preview: 10 steps
  • Balanced: 20 steps
  • Higher quality: 30 steps

Excluded variants

The following are intentionally not published:

  • coreml_ios18_int4_only
  • coreml_ios18_int4_int8
  • any package with int4 or linear4 in its filename