Duplicate from Nooder/supertonic-2-coreml

1566c3d about 1 month ago

4.46 kB

	---
	license: openrail
	language:
	- en
	- ko
	- es
	- pt
	- fr
	pipeline_tag: text-to-speech
	tags:
	- coreml
	- ios
	- macos
	- tts
	- supertonic
	- mlprogram
	---

	# Supertonic-2 CoreML

	This repository provides CoreML exports of Supertonic 2 for macOS and iOS.
	It focuses on on-device inference with multiple >=8-bit quantization variants.

	GitHub repo (code + demo app): https://github.com/Nooder/supertonic-2-coreml

	## Code & demo

	The GitHub repo contains:
	- Swift demo app (CoreML pipeline + UI): `supertonic2-coreml-ios-test/`
	- CoreML tooling + tests: `scripts/`
	- Docs: `docs/`

	## What is included

	- `models/`: CoreML model packages by variant (>=8-bit only)
	- `resources/`: voice styles, embeddings, and text normalization assets
	- `manifest.json`: list of artifacts with checksums and sizes
	- `SHA256SUMS`: sha256 checksums for all files
	- `tests/`: smoke tests for CoreML model loading

	## Quickstart (iOS / macOS)

	1. Pick a variant from `models/` (see the quant matrix in `docs/quant-matrix.md`).
	2. Bundle the corresponding CoreML packages and `resources/` into your app.
	3. Use the Swift demo app in the GitHub repo `supertonic-2-coreml` as the
	reference implementation.

	## Required files (checklist)

	Bundle the following into your app:

	- CoreML packages for your chosen variant:
	- `duration_predictor_mlprogram.mlpackage`
	- `text_encoder_mlprogram.mlpackage`
	- `vector_estimator_mlprogram.mlpackage`
	- `vocoder_mlprogram.mlpackage`
	- `resources/voice_styles/`
	- `resources/embeddings/`
	- `resources/onnx/unicode_indexer.json`
	- `resources/onnx/tts.json`

	## Minimal iOS integration

	```swift
	// Example usage (see demo app for full UI + playback)
	let service = try TTSService(computeUnits: .all)
	let result = try service.synthesize(
	text: "Hello from CoreML!",
	language: .en,
	voiceName: "F1",
	steps: 20,
	speed: 1.0,
	silenceSeconds: 0.3
	)
	print("WAV file:", result.url)
	```

	To select a specific variant, update the CoreML folder name in
	`TTSService` (the demo defaults to `coreml_int8`).

	## Example: iOS 18 `int8_both`

	This variant uses int8 weights for multiple stages on iOS 18.

	Bundle these files in your app:

	```
	Resources/
	coreml_ios18_int8_both/
	duration_predictor_mlprogram.mlpackage
	text_encoder_mlprogram.mlpackage
	vector_estimator_mlprogram.mlpackage
	vocoder_mlprogram.mlpackage
	voice_styles/
	embeddings/
	onnx/
	unicode_indexer.json
	tts.json
	```

	In the Swift demo app, update the CoreML folder name to point at
	`coreml_ios18_int8_both` (the app defaults to `coreml_int8`).

	## Choosing a variant

	Use the folder naming to select the right artifact:

	- `coreml_int8`: faster, lower fidelity
	- `coreml_compressed`: smaller memory (linear8)
	- `coreml_ios18_*`: for iOS 18 CoreML runtime (>=8-bit only)

	4-bit variants are intentionally excluded due to quality.

	## Variant matrix (quick view)

	\| Variant folder \| Quantization (by name) \| Intended target \| Notes \|
	\| --- \| --- \| --- \| --- \|
	\| `coreml` \| full precision (mixed) \| general \| baseline quality \|
	\| `coreml_int8` \| int8 (all stages) \| general \| faster, lower fidelity \|
	\| `coreml_compressed` \| linear8 \| general \| smaller memory \|
	\| `coreml_ios18` \| full precision (mlprogram) \| iOS 18+ \| best quality on iOS 18 \|
	\| `coreml_ios18_int8_vocoder_only` \| int8 (vocoder only) \| iOS 18+ \| balanced \|
	\| `coreml_ios18_int8_both` \| int8 (multiple stages) \| iOS 18+ \| fastest, more loss \|
	\| `coreml_compressed_ios18` \| linear8 \| iOS 18+ \| smallest memory \|

	For deeper guidance, see `docs/compatibility-matrix.md` and `docs/quant-matrix.md`.

	## Steps vs. quality (quick guide)

	\| Steps \| Speed \| Quality \|
	\| --- \| --- \| --- \|
	\| 10 \| fastest \| lowest \|
	\| 20 \| balanced \| good \|
	\| 30 \| slowest \| best \|

	## Troubleshooting

	- Missing resource error: Ensure `resources/` folders are bundled and named exactly.
	- Model not found: Confirm the CoreML folder name (e.g., `coreml_ios18_int8_both`).
	- Fails to load on device: Check iOS deployment target matches your variant.

	## Tests

	The `tests/test_coreml_models.py` script runs a simple smoke test that loads
	all stages (duration predictor, text encoder, vector estimator, vocoder) with
	dummy inputs.

	## Attribution and license

	This CoreML export is derived from Supertone/supertonic-2.
	Model weights are licensed under OpenRAIL-M (see `LICENSE`).
	Sample code is MIT-licensed (see `NOTICE` and `UPSTREAM.md`).