|
|
--- |
|
|
license: openrail |
|
|
language: |
|
|
- en |
|
|
- ko |
|
|
- es |
|
|
- pt |
|
|
- fr |
|
|
pipeline_tag: text-to-speech |
|
|
tags: |
|
|
- coreml |
|
|
- ios |
|
|
- macos |
|
|
- tts |
|
|
- supertonic |
|
|
- mlprogram |
|
|
--- |
|
|
|
|
|
# Supertonic-2 CoreML |
|
|
|
|
|
This repository provides CoreML exports of **Supertonic 2** for macOS and iOS. |
|
|
It focuses on on-device inference with multiple >=8-bit quantization variants. |
|
|
|
|
|
**GitHub repo (code + demo app):** https://github.com/Nooder/supertonic-2-coreml |
|
|
|
|
|
## Code & demo |
|
|
|
|
|
The GitHub repo contains: |
|
|
- **Swift demo app** (CoreML pipeline + UI): `supertonic2-coreml-ios-test/` |
|
|
- **CoreML tooling + tests**: `scripts/` |
|
|
- **Docs**: `docs/` |
|
|
|
|
|
## What is included |
|
|
|
|
|
- `models/`: CoreML model packages by variant (>=8-bit only) |
|
|
- `resources/`: voice styles, embeddings, and text normalization assets |
|
|
- `manifest.json`: list of artifacts with checksums and sizes |
|
|
- `SHA256SUMS`: sha256 checksums for all files |
|
|
- `tests/`: smoke tests for CoreML model loading |
|
|
|
|
|
## Quickstart (iOS / macOS) |
|
|
|
|
|
1. Pick a variant from `models/` (see the quant matrix in `docs/quant-matrix.md`). |
|
|
2. Bundle the corresponding CoreML packages and `resources/` into your app. |
|
|
3. Use the Swift demo app in the GitHub repo `supertonic-2-coreml` as the |
|
|
reference implementation. |
|
|
|
|
|
## Required files (checklist) |
|
|
|
|
|
Bundle the following into your app: |
|
|
|
|
|
- CoreML packages for your chosen variant: |
|
|
- `duration_predictor_mlprogram.mlpackage` |
|
|
- `text_encoder_mlprogram.mlpackage` |
|
|
- `vector_estimator_mlprogram.mlpackage` |
|
|
- `vocoder_mlprogram.mlpackage` |
|
|
- `resources/voice_styles/` |
|
|
- `resources/embeddings/` |
|
|
- `resources/onnx/unicode_indexer.json` |
|
|
- `resources/onnx/tts.json` |
|
|
|
|
|
## Minimal iOS integration |
|
|
|
|
|
```swift |
|
|
// Example usage (see demo app for full UI + playback) |
|
|
let service = try TTSService(computeUnits: .all) |
|
|
let result = try service.synthesize( |
|
|
text: "Hello from CoreML!", |
|
|
language: .en, |
|
|
voiceName: "F1", |
|
|
steps: 20, |
|
|
speed: 1.0, |
|
|
silenceSeconds: 0.3 |
|
|
) |
|
|
print("WAV file:", result.url) |
|
|
``` |
|
|
|
|
|
To select a specific variant, update the CoreML folder name in |
|
|
`TTSService` (the demo defaults to `coreml_int8`). |
|
|
|
|
|
## Example: iOS 18 `int8_both` |
|
|
|
|
|
This variant uses int8 weights for multiple stages on iOS 18. |
|
|
|
|
|
Bundle these files in your app: |
|
|
|
|
|
``` |
|
|
Resources/ |
|
|
coreml_ios18_int8_both/ |
|
|
duration_predictor_mlprogram.mlpackage |
|
|
text_encoder_mlprogram.mlpackage |
|
|
vector_estimator_mlprogram.mlpackage |
|
|
vocoder_mlprogram.mlpackage |
|
|
voice_styles/ |
|
|
embeddings/ |
|
|
onnx/ |
|
|
unicode_indexer.json |
|
|
tts.json |
|
|
``` |
|
|
|
|
|
In the Swift demo app, update the CoreML folder name to point at |
|
|
`coreml_ios18_int8_both` (the app defaults to `coreml_int8`). |
|
|
|
|
|
## Choosing a variant |
|
|
|
|
|
Use the folder naming to select the right artifact: |
|
|
|
|
|
- `coreml_int8`: faster, lower fidelity |
|
|
- `coreml_compressed`: smaller memory (linear8) |
|
|
- `coreml_ios18_*`: for iOS 18 CoreML runtime (>=8-bit only) |
|
|
|
|
|
4-bit variants are intentionally excluded due to quality. |
|
|
|
|
|
## Variant matrix (quick view) |
|
|
|
|
|
| Variant folder | Quantization (by name) | Intended target | Notes | |
|
|
| --- | --- | --- | --- | |
|
|
| `coreml` | full precision (mixed) | general | baseline quality | |
|
|
| `coreml_int8` | int8 (all stages) | general | faster, lower fidelity | |
|
|
| `coreml_compressed` | linear8 | general | smaller memory | |
|
|
| `coreml_ios18` | full precision (mlprogram) | iOS 18+ | best quality on iOS 18 | |
|
|
| `coreml_ios18_int8_vocoder_only` | int8 (vocoder only) | iOS 18+ | balanced | |
|
|
| `coreml_ios18_int8_both` | int8 (multiple stages) | iOS 18+ | fastest, more loss | |
|
|
| `coreml_compressed_ios18` | linear8 | iOS 18+ | smallest memory | |
|
|
|
|
|
For deeper guidance, see `docs/compatibility-matrix.md` and `docs/quant-matrix.md`. |
|
|
|
|
|
## Steps vs. quality (quick guide) |
|
|
|
|
|
| Steps | Speed | Quality | |
|
|
| --- | --- | --- | |
|
|
| 10 | fastest | lowest | |
|
|
| 20 | balanced | good | |
|
|
| 30 | slowest | best | |
|
|
|
|
|
## Troubleshooting |
|
|
|
|
|
- **Missing resource error:** Ensure `resources/` folders are bundled and named exactly. |
|
|
- **Model not found:** Confirm the CoreML folder name (e.g., `coreml_ios18_int8_both`). |
|
|
- **Fails to load on device:** Check iOS deployment target matches your variant. |
|
|
|
|
|
## Tests |
|
|
|
|
|
The `tests/test_coreml_models.py` script runs a simple smoke test that loads |
|
|
all stages (duration predictor, text encoder, vector estimator, vocoder) with |
|
|
dummy inputs. |
|
|
|
|
|
## Attribution and license |
|
|
|
|
|
This CoreML export is derived from **Supertone/supertonic-2**. |
|
|
Model weights are licensed under **OpenRAIL-M** (see `LICENSE`). |
|
|
Sample code is MIT-licensed (see `NOTICE` and `UPSTREAM.md`). |
|
|
|