Instructions to use thy025/supertonic-3 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Supertonic
How to use thy025/supertonic-3 with Supertonic:
from supertonic import TTS tts = TTS(auto_download=True) style = tts.get_voice_style(voice_name="M1") text = "The train delay was announced at 4:45 PM on Wed, Apr 3, 2024 due to track maintenance." wav, duration = tts.synthesize(text, voice_style=style) tts.save_audio(wav, "output.wav")
- Notebooks
- Google Colab
- Kaggle
| license: openrail | |
| language: | |
| - en | |
| - ko | |
| - ja | |
| - ar | |
| - bg | |
| - cs | |
| - da | |
| - de | |
| - el | |
| - es | |
| - et | |
| - fi | |
| - fr | |
| - hi | |
| - hr | |
| - hu | |
| - id | |
| - it | |
| - lt | |
| - lv | |
| - nl | |
| - pl | |
| - pt | |
| - ro | |
| - ru | |
| - sk | |
| - sl | |
| - sv | |
| - tr | |
| - uk | |
| - vi | |
| pipeline_tag: text-to-speech | |
| tags: | |
| - text-to-speech | |
| - speech-synthesis | |
| - tts | |
| - onnx | |
| - multilingual | |
| - on-device | |
| library_name: supertonic | |
| # Supertonic 3 | Lightning Fast, On-Device, Accurate TTS | |
|  | |
| <p align="center"> | |
| <a href="https://huggingface.co/spaces/Supertone/supertonic-3"><img src="https://img.shields.io/badge/Demo-Hugging_Face-yellow?style=for-the-badge" alt="Demo"></a> | |
| <a href="https://github.com/supertone-inc/supertonic"><img src="https://img.shields.io/badge/Code-GitHub-black?style=for-the-badge&logo=github" alt="Code"></a> | |
| <a href="https://pypi.org/project/supertonic/"><img src="https://img.shields.io/badge/Python-SDK-blue?style=for-the-badge&logo=python" alt="Python SDK"></a> | |
| </p> | |
| **Supertonic** is a lightweight text-to-speech system for local inference. It runs with ONNX Runtime entirely on your device, with no cloud call required for synthesis. | |
| **Supertonic 3** expands the open-weight release from 5 to **31 languages**, improves reading stability, and reduces repeat/skip failures. | |
| ## Quick Start | |
| Install the Python SDK and generate speech immediately. On first run, the SDK downloads the model assets from Hugging Face. | |
| ```bash | |
| pip install supertonic | |
| ``` | |
| ```python | |
| from supertonic import TTS | |
| tts = TTS(auto_download=True) | |
| style = tts.get_voice_style(voice_name="M1") | |
| text = "A gentle breeze moved through the open window while everyone listened to the story." | |
| wav, duration = tts.synthesize(text, voice_style=style, lang="en") | |
| tts.save_audio(wav, "output.wav") | |
| print(f"Generated {duration:.2f}s of audio") | |
| ``` | |
| ## What's New in Supertonic 3 | |
| - **31 languages**: expanded from the 5-language Supertonic 2 release. | |
| - **More stable reading**: fewer repeat and skip failures, especially on short and long utterances. | |
| - **Higher speaker similarity**: improved similarity across the shared-language set compared with Supertonic 2. | |
| - **Expression tags**: supports simple tags such as `<laugh>`, `<breath>`, and `<sigh>`. | |
| ## Performance Highlights | |
| Supertonic 3 is designed for practical on-device inference: compact enough to run locally, while staying competitive with much larger open TTS systems. | |
| ### Reading Accuracy | |
| <p align="center"> | |
| <img src="img/metrics/s3_vs_measured_wer_range_voxcpm2.png" alt="Supertonic 3 reading accuracy compared with measured model ranges and VoxCPM2"> | |
| </p> | |
| Across measured languages, Supertonic 3 stays within a competitive WER/CER range against much larger open TTS models such as VoxCPM2, while preserving a lightweight on-device deployment path. Asterisked languages use CER; the others use WER. | |
| ### Supertonic 2 to Supertonic 3 | |
| <p align="center"> | |
| <img src="img/metrics/supertonic2_vs_3_comparison.png" alt="Supertonic 2 and Supertonic 3 comparison"> | |
| </p> | |
| Compared with Supertonic 2, Supertonic 3 reduces repeat and skip failures, improves speaker similarity across the shared-language set, and expands language coverage from 5 to 31 languages. | |
| ### Runtime Footprint | |
| <p align="center"> | |
| <img src="img/metrics/runtime_cpu_gpu_latency_memory.png" alt="Supertonic CPU runtime compared with GPU baselines"> | |
| </p> | |
| Supertonic 3 runs fast on CPU, even compared with larger baselines measured on A100 GPU, and uses substantially less memory. It does not require a GPU, which makes local, browser, and edge deployment much easier. | |
| ### Model Size | |
| <p align="center"> | |
| <img src="img/metrics/model_size_comparison.png" alt="Model size comparison"> | |
| </p> | |
| At about 99M parameters across the public ONNX assets, Supertonic 3 is much smaller than 0.7B to 2B class open TTS systems. The smaller model size is a practical advantage for download size, startup time, and on-device inference. | |
| ## Supported Languages | |
| | Code | Language | Code | Language | Code | Language | Code | Language | | |
| |------|----------|------|----------|------|----------|------|----------| | |
| | `en` | English | `ko` | Korean | `ja` | Japanese | `ar` | Arabic | | |
| | `bg` | Bulgarian | `cs` | Czech | `da` | Danish | `de` | German | | |
| | `el` | Greek | `es` | Spanish | `et` | Estonian | `fi` | Finnish | | |
| | `fr` | French | `hi` | Hindi | `hr` | Croatian | `hu` | Hungarian | | |
| | `id` | Indonesian | `it` | Italian | `lt` | Lithuanian | `lv` | Latvian | | |
| | `nl` | Dutch | `pl` | Polish | `pt` | Portuguese | `ro` | Romanian | | |
| | `ru` | Russian | `sk` | Slovak | `sl` | Slovenian | `sv` | Swedish | | |
| | `tr` | Turkish | `uk` | Ukrainian | `vi` | Vietnamese | | | | |
| ## License | |
| This project's sample code is released under the MIT License. See the [GitHub repository](https://github.com/supertone-inc/supertonic) for details. | |
| The accompanying model is released under the OpenRAIL-M License. See the [LICENSE](https://huggingface.co/Supertone/supertonic-3/blob/main/LICENSE) file in this repository for details. | |
| This model was trained using PyTorch, which is licensed under the BSD 3-Clause License but is not redistributed with this project. See the [PyTorch license](https://docs.pytorch.org/FBGEMM/general/License.html) for details. | |
| Copyright (c) 2026 Supertone Inc. | |