--- license: openrail language: - en - ko - es - pt - fr pipeline_tag: text-to-speech tags: - coreml - ios - macos - tts - supertonic - mlprogram --- # Supertonic-2 CoreML This repository provides CoreML exports of **Supertonic 2** for macOS and iOS. It focuses on on-device inference with multiple >=8-bit quantization variants. **GitHub repo (code + demo app):** https://github.com/Nooder/supertonic-2-coreml ## Code & demo The GitHub repo contains: - **Swift demo app** (CoreML pipeline + UI): `supertonic2-coreml-ios-test/` - **CoreML tooling + tests**: `scripts/` - **Docs**: `docs/` ## What is included - `models/`: CoreML model packages by variant (>=8-bit only) - `resources/`: voice styles, embeddings, and text normalization assets - `manifest.json`: list of artifacts with checksums and sizes - `SHA256SUMS`: sha256 checksums for all files - `tests/`: smoke tests for CoreML model loading ## Quickstart (iOS / macOS) 1. Pick a variant from `models/` (see the quant matrix in `docs/quant-matrix.md`). 2. Bundle the corresponding CoreML packages and `resources/` into your app. 3. Use the Swift demo app in the GitHub repo `supertonic-2-coreml` as the reference implementation. ## Required files (checklist) Bundle the following into your app: - CoreML packages for your chosen variant: - `duration_predictor_mlprogram.mlpackage` - `text_encoder_mlprogram.mlpackage` - `vector_estimator_mlprogram.mlpackage` - `vocoder_mlprogram.mlpackage` - `resources/voice_styles/` - `resources/embeddings/` - `resources/onnx/unicode_indexer.json` - `resources/onnx/tts.json` ## Minimal iOS integration ```swift // Example usage (see demo app for full UI + playback) let service = try TTSService(computeUnits: .all) let result = try service.synthesize( text: "Hello from CoreML!", language: .en, voiceName: "F1", steps: 20, speed: 1.0, silenceSeconds: 0.3 ) print("WAV file:", result.url) ``` To select a specific variant, update the CoreML folder name in `TTSService` (the demo defaults to `coreml_int8`). ## Example: iOS 18 `int8_both` This variant uses int8 weights for multiple stages on iOS 18. Bundle these files in your app: ``` Resources/ coreml_ios18_int8_both/ duration_predictor_mlprogram.mlpackage text_encoder_mlprogram.mlpackage vector_estimator_mlprogram.mlpackage vocoder_mlprogram.mlpackage voice_styles/ embeddings/ onnx/ unicode_indexer.json tts.json ``` In the Swift demo app, update the CoreML folder name to point at `coreml_ios18_int8_both` (the app defaults to `coreml_int8`). ## Choosing a variant Use the folder naming to select the right artifact: - `coreml_int8`: faster, lower fidelity - `coreml_compressed`: smaller memory (linear8) - `coreml_ios18_*`: for iOS 18 CoreML runtime (>=8-bit only) 4-bit variants are intentionally excluded due to quality. ## Variant matrix (quick view) | Variant folder | Quantization (by name) | Intended target | Notes | | --- | --- | --- | --- | | `coreml` | full precision (mixed) | general | baseline quality | | `coreml_int8` | int8 (all stages) | general | faster, lower fidelity | | `coreml_compressed` | linear8 | general | smaller memory | | `coreml_ios18` | full precision (mlprogram) | iOS 18+ | best quality on iOS 18 | | `coreml_ios18_int8_vocoder_only` | int8 (vocoder only) | iOS 18+ | balanced | | `coreml_ios18_int8_both` | int8 (multiple stages) | iOS 18+ | fastest, more loss | | `coreml_compressed_ios18` | linear8 | iOS 18+ | smallest memory | For deeper guidance, see `docs/compatibility-matrix.md` and `docs/quant-matrix.md`. ## Steps vs. quality (quick guide) | Steps | Speed | Quality | | --- | --- | --- | | 10 | fastest | lowest | | 20 | balanced | good | | 30 | slowest | best | ## Troubleshooting - **Missing resource error:** Ensure `resources/` folders are bundled and named exactly. - **Model not found:** Confirm the CoreML folder name (e.g., `coreml_ios18_int8_both`). - **Fails to load on device:** Check iOS deployment target matches your variant. ## Tests The `tests/test_coreml_models.py` script runs a simple smoke test that loads all stages (duration predictor, text encoder, vector estimator, vocoder) with dummy inputs. ## Attribution and license This CoreML export is derived from **Supertone/supertonic-2**. Model weights are licensed under **OpenRAIL-M** (see `LICENSE`). Sample code is MIT-licensed (see `NOTICE` and `UPSTREAM.md`).