File size: 4,459 Bytes

---
license: openrail
language:
  - en
  - ko
  - es
  - pt
  - fr
pipeline_tag: text-to-speech
tags:
  - coreml
  - ios
  - macos
  - tts
  - supertonic
  - mlprogram
---

# Supertonic-2 CoreML

This repository provides CoreML exports of **Supertonic 2** for macOS and iOS.
It focuses on on-device inference with multiple >=8-bit quantization variants.

**GitHub repo (code + demo app):** https://github.com/Nooder/supertonic-2-coreml

## Code & demo

The GitHub repo contains:
- **Swift demo app** (CoreML pipeline + UI): `supertonic2-coreml-ios-test/`
- **CoreML tooling + tests**: `scripts/`
- **Docs**: `docs/`

## What is included

- `models/`: CoreML model packages by variant (>=8-bit only)
- `resources/`: voice styles, embeddings, and text normalization assets
- `manifest.json`: list of artifacts with checksums and sizes
- `SHA256SUMS`: sha256 checksums for all files
- `tests/`: smoke tests for CoreML model loading

## Quickstart (iOS / macOS)

1. Pick a variant from `models/` (see the quant matrix in `docs/quant-matrix.md`).
2. Bundle the corresponding CoreML packages and `resources/` into your app.
3. Use the Swift demo app in the GitHub repo `supertonic-2-coreml` as the
   reference implementation.

## Required files (checklist)

Bundle the following into your app:

- CoreML packages for your chosen variant:
  - `duration_predictor_mlprogram.mlpackage`
  - `text_encoder_mlprogram.mlpackage`
  - `vector_estimator_mlprogram.mlpackage`
  - `vocoder_mlprogram.mlpackage`
- `resources/voice_styles/`
- `resources/embeddings/`
- `resources/onnx/unicode_indexer.json`
- `resources/onnx/tts.json`

## Minimal iOS integration

```swift
// Example usage (see demo app for full UI + playback)
let service = try TTSService(computeUnits: .all)
let result = try service.synthesize(
    text: "Hello from CoreML!",
    language: .en,
    voiceName: "F1",
    steps: 20,
    speed: 1.0,
    silenceSeconds: 0.3
)
print("WAV file:", result.url)
```

To select a specific variant, update the CoreML folder name in
`TTSService` (the demo defaults to `coreml_int8`).

## Example: iOS 18 `int8_both`

This variant uses int8 weights for multiple stages on iOS 18.

Bundle these files in your app:

```
Resources/
  coreml_ios18_int8_both/
    duration_predictor_mlprogram.mlpackage
    text_encoder_mlprogram.mlpackage
    vector_estimator_mlprogram.mlpackage
    vocoder_mlprogram.mlpackage
  voice_styles/
  embeddings/
  onnx/
    unicode_indexer.json
    tts.json
```

In the Swift demo app, update the CoreML folder name to point at
`coreml_ios18_int8_both` (the app defaults to `coreml_int8`).

## Choosing a variant

Use the folder naming to select the right artifact:

- `coreml_int8`: faster, lower fidelity
- `coreml_compressed`: smaller memory (linear8)
- `coreml_ios18_*`: for iOS 18 CoreML runtime (>=8-bit only)

4-bit variants are intentionally excluded due to quality.

## Variant matrix (quick view)

| Variant folder | Quantization (by name) | Intended target | Notes |
| --- | --- | --- | --- |
| `coreml` | full precision (mixed) | general | baseline quality |
| `coreml_int8` | int8 (all stages) | general | faster, lower fidelity |
| `coreml_compressed` | linear8 | general | smaller memory |
| `coreml_ios18` | full precision (mlprogram) | iOS 18+ | best quality on iOS 18 |
| `coreml_ios18_int8_vocoder_only` | int8 (vocoder only) | iOS 18+ | balanced |
| `coreml_ios18_int8_both` | int8 (multiple stages) | iOS 18+ | fastest, more loss |
| `coreml_compressed_ios18` | linear8 | iOS 18+ | smallest memory |

For deeper guidance, see `docs/compatibility-matrix.md` and `docs/quant-matrix.md`.

## Steps vs. quality (quick guide)

| Steps | Speed | Quality |
| --- | --- | --- |
| 10 | fastest | lowest |
| 20 | balanced | good |
| 30 | slowest | best |

## Troubleshooting

- **Missing resource error:** Ensure `resources/` folders are bundled and named exactly.
- **Model not found:** Confirm the CoreML folder name (e.g., `coreml_ios18_int8_both`).
- **Fails to load on device:** Check iOS deployment target matches your variant.

## Tests

The `tests/test_coreml_models.py` script runs a simple smoke test that loads
all stages (duration predictor, text encoder, vector estimator, vocoder) with
dummy inputs.

## Attribution and license

This CoreML export is derived from **Supertone/supertonic-2**.
Model weights are licensed under **OpenRAIL-M** (see `LICENSE`).
Sample code is MIT-licensed (see `NOTICE` and `UPSTREAM.md`).