Instructions to use aoiandroid/supertonic-2-coreml with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Supertonic
How to use aoiandroid/supertonic-2-coreml with Supertonic:
from supertonic import TTS tts = TTS(auto_download=True) style = tts.get_voice_style(voice_name="M1") text = "The train delay was announced at 4:45 PM on Wed, Apr 3, 2024 due to track maintenance." wav, duration = tts.synthesize(text, voice_style=style) tts.save_audio(wav, "output.wav")
- Notebooks
- Google Colab
- Kaggle
| license: openrail | |
| language: | |
| - en | |
| - ko | |
| - es | |
| - pt | |
| - fr | |
| pipeline_tag: text-to-speech | |
| tags: | |
| - coreml | |
| - ios | |
| - macos | |
| - tts | |
| - supertonic | |
| - mlprogram | |
| # Supertonic-2 CoreML | |
| This repository provides CoreML exports of **Supertonic 2** for macOS and iOS. | |
| It focuses on on-device inference with multiple >=8-bit quantization variants. | |
| **GitHub repo (code + demo app):** https://github.com/Nooder/supertonic-2-coreml | |
| ## Code & demo | |
| The GitHub repo contains: | |
| - **Swift demo app** (CoreML pipeline + UI): `supertonic2-coreml-ios-test/` | |
| - **CoreML tooling + tests**: `scripts/` | |
| - **Docs**: `docs/` | |
| ## What is included | |
| - `models/`: CoreML model packages by variant (>=8-bit only) | |
| - `resources/`: voice styles, embeddings, and text normalization assets | |
| - `manifest.json`: list of artifacts with checksums and sizes | |
| - `SHA256SUMS`: sha256 checksums for all files | |
| - `tests/`: smoke tests for CoreML model loading | |
| ## Quickstart (iOS / macOS) | |
| 1. Pick a variant from `models/` (see the quant matrix in `docs/quant-matrix.md`). | |
| 2. Bundle the corresponding CoreML packages and `resources/` into your app. | |
| 3. Use the Swift demo app in the GitHub repo `supertonic-2-coreml` as the | |
| reference implementation. | |
| ## Required files (checklist) | |
| Bundle the following into your app: | |
| - CoreML packages for your chosen variant: | |
| - `duration_predictor_mlprogram.mlpackage` | |
| - `text_encoder_mlprogram.mlpackage` | |
| - `vector_estimator_mlprogram.mlpackage` | |
| - `vocoder_mlprogram.mlpackage` | |
| - `resources/voice_styles/` | |
| - `resources/embeddings/` | |
| - `resources/onnx/unicode_indexer.json` | |
| - `resources/onnx/tts.json` | |
| ## Minimal iOS integration | |
| ```swift | |
| // Example usage (see demo app for full UI + playback) | |
| let service = try TTSService(computeUnits: .all) | |
| let result = try service.synthesize( | |
| text: "Hello from CoreML!", | |
| language: .en, | |
| voiceName: "F1", | |
| steps: 20, | |
| speed: 1.0, | |
| silenceSeconds: 0.3 | |
| ) | |
| print("WAV file:", result.url) | |
| ``` | |
| To select a specific variant, update the CoreML folder name in | |
| `TTSService` (the demo defaults to `coreml_int8`). | |
| ## Example: iOS 18 `int8_both` | |
| This variant uses int8 weights for multiple stages on iOS 18. | |
| Bundle these files in your app: | |
| ``` | |
| Resources/ | |
| coreml_ios18_int8_both/ | |
| duration_predictor_mlprogram.mlpackage | |
| text_encoder_mlprogram.mlpackage | |
| vector_estimator_mlprogram.mlpackage | |
| vocoder_mlprogram.mlpackage | |
| voice_styles/ | |
| embeddings/ | |
| onnx/ | |
| unicode_indexer.json | |
| tts.json | |
| ``` | |
| In the Swift demo app, update the CoreML folder name to point at | |
| `coreml_ios18_int8_both` (the app defaults to `coreml_int8`). | |
| ## Choosing a variant | |
| Use the folder naming to select the right artifact: | |
| - `coreml_int8`: faster, lower fidelity | |
| - `coreml_compressed`: smaller memory (linear8) | |
| - `coreml_ios18_*`: for iOS 18 CoreML runtime (>=8-bit only) | |
| 4-bit variants are intentionally excluded due to quality. | |
| ## Variant matrix (quick view) | |
| | Variant folder | Quantization (by name) | Intended target | Notes | | |
| | --- | --- | --- | --- | | |
| | `coreml` | full precision (mixed) | general | baseline quality | | |
| | `coreml_int8` | int8 (all stages) | general | faster, lower fidelity | | |
| | `coreml_compressed` | linear8 | general | smaller memory | | |
| | `coreml_ios18` | full precision (mlprogram) | iOS 18+ | best quality on iOS 18 | | |
| | `coreml_ios18_int8_vocoder_only` | int8 (vocoder only) | iOS 18+ | balanced | | |
| | `coreml_ios18_int8_both` | int8 (multiple stages) | iOS 18+ | fastest, more loss | | |
| | `coreml_compressed_ios18` | linear8 | iOS 18+ | smallest memory | | |
| For deeper guidance, see `docs/compatibility-matrix.md` and `docs/quant-matrix.md`. | |
| ## Steps vs. quality (quick guide) | |
| | Steps | Speed | Quality | | |
| | --- | --- | --- | | |
| | 10 | fastest | lowest | | |
| | 20 | balanced | good | | |
| | 30 | slowest | best | | |
| ## Troubleshooting | |
| - **Missing resource error:** Ensure `resources/` folders are bundled and named exactly. | |
| - **Model not found:** Confirm the CoreML folder name (e.g., `coreml_ios18_int8_both`). | |
| - **Fails to load on device:** Check iOS deployment target matches your variant. | |
| ## Tests | |
| The `tests/test_coreml_models.py` script runs a simple smoke test that loads | |
| all stages (duration predictor, text encoder, vector estimator, vocoder) with | |
| dummy inputs. | |
| ## Attribution and license | |
| This CoreML export is derived from **Supertone/supertonic-2**. | |
| Model weights are licensed under **OpenRAIL-M** (see `LICENSE`). | |
| Sample code is MIT-licensed (see `NOTICE` and `UPSTREAM.md`). | |