supertonic-2-coreml / README.md
Nooder's picture
Expand quickstart and troubleshooting
828c04d
---
license: openrail
language:
- en
- ko
- es
- pt
- fr
pipeline_tag: text-to-speech
tags:
- coreml
- ios
- macos
- tts
- supertonic
- mlprogram
---
# Supertonic-2 CoreML
This repository provides CoreML exports of **Supertonic 2** for macOS and iOS.
It focuses on on-device inference with multiple >=8-bit quantization variants.
**GitHub repo (code + demo app):** https://github.com/Nooder/supertonic-2-coreml
## Code & demo
The GitHub repo contains:
- **Swift demo app** (CoreML pipeline + UI): `supertonic2-coreml-ios-test/`
- **CoreML tooling + tests**: `scripts/`
- **Docs**: `docs/`
## What is included
- `models/`: CoreML model packages by variant (>=8-bit only)
- `resources/`: voice styles, embeddings, and text normalization assets
- `manifest.json`: list of artifacts with checksums and sizes
- `SHA256SUMS`: sha256 checksums for all files
- `tests/`: smoke tests for CoreML model loading
## Quickstart (iOS / macOS)
1. Pick a variant from `models/` (see the quant matrix in `docs/quant-matrix.md`).
2. Bundle the corresponding CoreML packages and `resources/` into your app.
3. Use the Swift demo app in the GitHub repo `supertonic-2-coreml` as the
reference implementation.
## Required files (checklist)
Bundle the following into your app:
- CoreML packages for your chosen variant:
- `duration_predictor_mlprogram.mlpackage`
- `text_encoder_mlprogram.mlpackage`
- `vector_estimator_mlprogram.mlpackage`
- `vocoder_mlprogram.mlpackage`
- `resources/voice_styles/`
- `resources/embeddings/`
- `resources/onnx/unicode_indexer.json`
- `resources/onnx/tts.json`
## Minimal iOS integration
```swift
// Example usage (see demo app for full UI + playback)
let service = try TTSService(computeUnits: .all)
let result = try service.synthesize(
text: "Hello from CoreML!",
language: .en,
voiceName: "F1",
steps: 20,
speed: 1.0,
silenceSeconds: 0.3
)
print("WAV file:", result.url)
```
To select a specific variant, update the CoreML folder name in
`TTSService` (the demo defaults to `coreml_int8`).
## Example: iOS 18 `int8_both`
This variant uses int8 weights for multiple stages on iOS 18.
Bundle these files in your app:
```
Resources/
coreml_ios18_int8_both/
duration_predictor_mlprogram.mlpackage
text_encoder_mlprogram.mlpackage
vector_estimator_mlprogram.mlpackage
vocoder_mlprogram.mlpackage
voice_styles/
embeddings/
onnx/
unicode_indexer.json
tts.json
```
In the Swift demo app, update the CoreML folder name to point at
`coreml_ios18_int8_both` (the app defaults to `coreml_int8`).
## Choosing a variant
Use the folder naming to select the right artifact:
- `coreml_int8`: faster, lower fidelity
- `coreml_compressed`: smaller memory (linear8)
- `coreml_ios18_*`: for iOS 18 CoreML runtime (>=8-bit only)
4-bit variants are intentionally excluded due to quality.
## Variant matrix (quick view)
| Variant folder | Quantization (by name) | Intended target | Notes |
| --- | --- | --- | --- |
| `coreml` | full precision (mixed) | general | baseline quality |
| `coreml_int8` | int8 (all stages) | general | faster, lower fidelity |
| `coreml_compressed` | linear8 | general | smaller memory |
| `coreml_ios18` | full precision (mlprogram) | iOS 18+ | best quality on iOS 18 |
| `coreml_ios18_int8_vocoder_only` | int8 (vocoder only) | iOS 18+ | balanced |
| `coreml_ios18_int8_both` | int8 (multiple stages) | iOS 18+ | fastest, more loss |
| `coreml_compressed_ios18` | linear8 | iOS 18+ | smallest memory |
For deeper guidance, see `docs/compatibility-matrix.md` and `docs/quant-matrix.md`.
## Steps vs. quality (quick guide)
| Steps | Speed | Quality |
| --- | --- | --- |
| 10 | fastest | lowest |
| 20 | balanced | good |
| 30 | slowest | best |
## Troubleshooting
- **Missing resource error:** Ensure `resources/` folders are bundled and named exactly.
- **Model not found:** Confirm the CoreML folder name (e.g., `coreml_ios18_int8_both`).
- **Fails to load on device:** Check iOS deployment target matches your variant.
## Tests
The `tests/test_coreml_models.py` script runs a simple smoke test that loads
all stages (duration predictor, text encoder, vector estimator, vocoder) with
dummy inputs.
## Attribution and license
This CoreML export is derived from **Supertone/supertonic-2**.
Model weights are licensed under **OpenRAIL-M** (see `LICENSE`).
Sample code is MIT-licensed (see `NOTICE` and `UPSTREAM.md`).