File size: 4,459 Bytes
b6a13bc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cbec68b
 
af1aecd
 
 
 
 
 
 
b6a13bc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
828c04d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4057674
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b6a13bc
 
 
 
 
 
 
 
 
 
19630e8
 
 
 
 
 
 
 
 
 
 
 
 
 
828c04d
 
 
 
 
 
 
 
 
 
 
 
 
 
b6a13bc
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
---
license: openrail
language:
  - en
  - ko
  - es
  - pt
  - fr
pipeline_tag: text-to-speech
tags:
  - coreml
  - ios
  - macos
  - tts
  - supertonic
  - mlprogram
---

# Supertonic-2 CoreML

This repository provides CoreML exports of **Supertonic 2** for macOS and iOS.
It focuses on on-device inference with multiple >=8-bit quantization variants.

**GitHub repo (code + demo app):** https://github.com/Nooder/supertonic-2-coreml

## Code & demo

The GitHub repo contains:
- **Swift demo app** (CoreML pipeline + UI): `supertonic2-coreml-ios-test/`
- **CoreML tooling + tests**: `scripts/`
- **Docs**: `docs/`

## What is included

- `models/`: CoreML model packages by variant (>=8-bit only)
- `resources/`: voice styles, embeddings, and text normalization assets
- `manifest.json`: list of artifacts with checksums and sizes
- `SHA256SUMS`: sha256 checksums for all files
- `tests/`: smoke tests for CoreML model loading

## Quickstart (iOS / macOS)

1. Pick a variant from `models/` (see the quant matrix in `docs/quant-matrix.md`).
2. Bundle the corresponding CoreML packages and `resources/` into your app.
3. Use the Swift demo app in the GitHub repo `supertonic-2-coreml` as the
   reference implementation.

## Required files (checklist)

Bundle the following into your app:

- CoreML packages for your chosen variant:
  - `duration_predictor_mlprogram.mlpackage`
  - `text_encoder_mlprogram.mlpackage`
  - `vector_estimator_mlprogram.mlpackage`
  - `vocoder_mlprogram.mlpackage`
- `resources/voice_styles/`
- `resources/embeddings/`
- `resources/onnx/unicode_indexer.json`
- `resources/onnx/tts.json`

## Minimal iOS integration

```swift
// Example usage (see demo app for full UI + playback)
let service = try TTSService(computeUnits: .all)
let result = try service.synthesize(
    text: "Hello from CoreML!",
    language: .en,
    voiceName: "F1",
    steps: 20,
    speed: 1.0,
    silenceSeconds: 0.3
)
print("WAV file:", result.url)
```

To select a specific variant, update the CoreML folder name in
`TTSService` (the demo defaults to `coreml_int8`).

## Example: iOS 18 `int8_both`

This variant uses int8 weights for multiple stages on iOS 18.

Bundle these files in your app:

```
Resources/
  coreml_ios18_int8_both/
    duration_predictor_mlprogram.mlpackage
    text_encoder_mlprogram.mlpackage
    vector_estimator_mlprogram.mlpackage
    vocoder_mlprogram.mlpackage
  voice_styles/
  embeddings/
  onnx/
    unicode_indexer.json
    tts.json
```

In the Swift demo app, update the CoreML folder name to point at
`coreml_ios18_int8_both` (the app defaults to `coreml_int8`).

## Choosing a variant

Use the folder naming to select the right artifact:

- `coreml_int8`: faster, lower fidelity
- `coreml_compressed`: smaller memory (linear8)
- `coreml_ios18_*`: for iOS 18 CoreML runtime (>=8-bit only)

4-bit variants are intentionally excluded due to quality.

## Variant matrix (quick view)

| Variant folder | Quantization (by name) | Intended target | Notes |
| --- | --- | --- | --- |
| `coreml` | full precision (mixed) | general | baseline quality |
| `coreml_int8` | int8 (all stages) | general | faster, lower fidelity |
| `coreml_compressed` | linear8 | general | smaller memory |
| `coreml_ios18` | full precision (mlprogram) | iOS 18+ | best quality on iOS 18 |
| `coreml_ios18_int8_vocoder_only` | int8 (vocoder only) | iOS 18+ | balanced |
| `coreml_ios18_int8_both` | int8 (multiple stages) | iOS 18+ | fastest, more loss |
| `coreml_compressed_ios18` | linear8 | iOS 18+ | smallest memory |

For deeper guidance, see `docs/compatibility-matrix.md` and `docs/quant-matrix.md`.

## Steps vs. quality (quick guide)

| Steps | Speed | Quality |
| --- | --- | --- |
| 10 | fastest | lowest |
| 20 | balanced | good |
| 30 | slowest | best |

## Troubleshooting

- **Missing resource error:** Ensure `resources/` folders are bundled and named exactly.
- **Model not found:** Confirm the CoreML folder name (e.g., `coreml_ios18_int8_both`).
- **Fails to load on device:** Check iOS deployment target matches your variant.

## Tests

The `tests/test_coreml_models.py` script runs a simple smoke test that loads
all stages (duration predictor, text encoder, vector estimator, vocoder) with
dummy inputs.

## Attribution and license

This CoreML export is derived from **Supertone/supertonic-2**.
Model weights are licensed under **OpenRAIL-M** (see `LICENSE`).
Sample code is MIT-licensed (see `NOTICE` and `UPSTREAM.md`).