File size: 4,459 Bytes
b6a13bc cbec68b af1aecd b6a13bc 828c04d 4057674 b6a13bc 19630e8 828c04d b6a13bc |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 |
---
license: openrail
language:
- en
- ko
- es
- pt
- fr
pipeline_tag: text-to-speech
tags:
- coreml
- ios
- macos
- tts
- supertonic
- mlprogram
---
# Supertonic-2 CoreML
This repository provides CoreML exports of **Supertonic 2** for macOS and iOS.
It focuses on on-device inference with multiple >=8-bit quantization variants.
**GitHub repo (code + demo app):** https://github.com/Nooder/supertonic-2-coreml
## Code & demo
The GitHub repo contains:
- **Swift demo app** (CoreML pipeline + UI): `supertonic2-coreml-ios-test/`
- **CoreML tooling + tests**: `scripts/`
- **Docs**: `docs/`
## What is included
- `models/`: CoreML model packages by variant (>=8-bit only)
- `resources/`: voice styles, embeddings, and text normalization assets
- `manifest.json`: list of artifacts with checksums and sizes
- `SHA256SUMS`: sha256 checksums for all files
- `tests/`: smoke tests for CoreML model loading
## Quickstart (iOS / macOS)
1. Pick a variant from `models/` (see the quant matrix in `docs/quant-matrix.md`).
2. Bundle the corresponding CoreML packages and `resources/` into your app.
3. Use the Swift demo app in the GitHub repo `supertonic-2-coreml` as the
reference implementation.
## Required files (checklist)
Bundle the following into your app:
- CoreML packages for your chosen variant:
- `duration_predictor_mlprogram.mlpackage`
- `text_encoder_mlprogram.mlpackage`
- `vector_estimator_mlprogram.mlpackage`
- `vocoder_mlprogram.mlpackage`
- `resources/voice_styles/`
- `resources/embeddings/`
- `resources/onnx/unicode_indexer.json`
- `resources/onnx/tts.json`
## Minimal iOS integration
```swift
// Example usage (see demo app for full UI + playback)
let service = try TTSService(computeUnits: .all)
let result = try service.synthesize(
text: "Hello from CoreML!",
language: .en,
voiceName: "F1",
steps: 20,
speed: 1.0,
silenceSeconds: 0.3
)
print("WAV file:", result.url)
```
To select a specific variant, update the CoreML folder name in
`TTSService` (the demo defaults to `coreml_int8`).
## Example: iOS 18 `int8_both`
This variant uses int8 weights for multiple stages on iOS 18.
Bundle these files in your app:
```
Resources/
coreml_ios18_int8_both/
duration_predictor_mlprogram.mlpackage
text_encoder_mlprogram.mlpackage
vector_estimator_mlprogram.mlpackage
vocoder_mlprogram.mlpackage
voice_styles/
embeddings/
onnx/
unicode_indexer.json
tts.json
```
In the Swift demo app, update the CoreML folder name to point at
`coreml_ios18_int8_both` (the app defaults to `coreml_int8`).
## Choosing a variant
Use the folder naming to select the right artifact:
- `coreml_int8`: faster, lower fidelity
- `coreml_compressed`: smaller memory (linear8)
- `coreml_ios18_*`: for iOS 18 CoreML runtime (>=8-bit only)
4-bit variants are intentionally excluded due to quality.
## Variant matrix (quick view)
| Variant folder | Quantization (by name) | Intended target | Notes |
| --- | --- | --- | --- |
| `coreml` | full precision (mixed) | general | baseline quality |
| `coreml_int8` | int8 (all stages) | general | faster, lower fidelity |
| `coreml_compressed` | linear8 | general | smaller memory |
| `coreml_ios18` | full precision (mlprogram) | iOS 18+ | best quality on iOS 18 |
| `coreml_ios18_int8_vocoder_only` | int8 (vocoder only) | iOS 18+ | balanced |
| `coreml_ios18_int8_both` | int8 (multiple stages) | iOS 18+ | fastest, more loss |
| `coreml_compressed_ios18` | linear8 | iOS 18+ | smallest memory |
For deeper guidance, see `docs/compatibility-matrix.md` and `docs/quant-matrix.md`.
## Steps vs. quality (quick guide)
| Steps | Speed | Quality |
| --- | --- | --- |
| 10 | fastest | lowest |
| 20 | balanced | good |
| 30 | slowest | best |
## Troubleshooting
- **Missing resource error:** Ensure `resources/` folders are bundled and named exactly.
- **Model not found:** Confirm the CoreML folder name (e.g., `coreml_ios18_int8_both`).
- **Fails to load on device:** Check iOS deployment target matches your variant.
## Tests
The `tests/test_coreml_models.py` script runs a simple smoke test that loads
all stages (duration predictor, text encoder, vector estimator, vocoder) with
dummy inputs.
## Attribution and license
This CoreML export is derived from **Supertone/supertonic-2**.
Model weights are licensed under **OpenRAIL-M** (see `LICENSE`).
Sample code is MIT-licensed (see `NOTICE` and `UPSTREAM.md`).
|