Instructions to use Vocabook/supertonic-2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Supertonic
How to use Vocabook/supertonic-2 with Supertonic:
from supertonic import TTS tts = TTS(auto_download=True) style = tts.get_voice_style(voice_name="M1") text = "The train delay was announced at 4:45 PM on Wed, Apr 3, 2024 due to track maintenance." wav, duration = tts.synthesize(text, voice_style=style) tts.save_audio(wav, "output.wav")
- Notebooks
- Google Colab
- Kaggle
| license: openrail | |
| language: | |
| - en | |
| - ko | |
| - es | |
| - pt | |
| - fr | |
| pipeline_tag: text-to-speech | |
| tags: | |
| - text-to-speech | |
| - speech-synthesis | |
| - tts | |
| - onnx | |
| library_name: supertonic | |
| # Supertonic 2 — Lightning Fast, On-Device TTS, Multilingual TTS | |
|  | |
| <p align="center"> | |
| <a href="https://huggingface.co/spaces/Supertone/supertonic-2"><img src="https://img.shields.io/badge/🤗_Demo-Hugging_Face-yellow?style=for-the-badge" alt="Demo"></a> | |
| <a href="https://github.com/supertone-inc/supertonic"><img src="https://img.shields.io/badge/💻_Code-GitHub-black?style=for-the-badge&logo=github" alt="Code"></a> | |
| </p> | |
| **Supertonic** is a lightning-fast, on-device text-to-speech system designed for **extreme performance** with minimal computational overhead. Powered by ONNX Runtime, it runs entirely on your device—no cloud, no API calls, no privacy concerns. | |
| ## What's New in Supertonic 2 | |
| **Supertonic 2** extends multilingual capabilities while maintaining the same inference speed and efficiency as the original. | |
| ### 🌍 Multilingual Support | |
| | Language | Code | | |
| |----------|------| | |
| | English | `en` | | |
| | Korean | `ko` | | |
| | Spanish | `es` | | |
| | Portuguese | `pt` | | |
| | French | `fr` | | |
| ### ⚡ Same Speed, More Languages | |
| - **No speed degradation**: Supertonic 2 delivers the same ultra-fast inference speed as the original—up to **167× faster than real-time** | |
| - **Efficient architecture**: Only **66M parameters**, optimized for on-device deployment | |
| - **Cross-language consistency**: All supported languages share the same model architecture and inference pipeline | |
| ## Performance | |
| We evaluated Supertonic's performance (with 2 inference steps) using two key metrics across input texts of varying lengths: Short (59 chars), Mid (152 chars), and Long (266 chars). | |
| **Metrics:** | |
| - **Characters per Second**: Measures throughput by dividing the number of input characters by the time required to generate audio. Higher is better. | |
| - **Real-time Factor (RTF)**: Measures the time taken to synthesize audio relative to its duration. Lower is better (e.g., RTF of 0.1 means it takes 0.1 seconds to generate one second of audio). | |
| ### Characters per Second | |
| | System | Short (59 chars) | Mid (152 chars) | Long (266 chars) | | |
| |--------|-----------------|----------------|-----------------| | |
| | **Supertonic** (M4 pro - CPU) | 912 | 1048 | 1263 | | |
| | **Supertonic** (M4 pro - WebGPU) | 996 | 1801 | 2509 | | |
| | **Supertonic** (RTX4090) | 2615 | 6548 | 12164 | | |
| | `API` [ElevenLabs Flash v2.5](https://elevenlabs.io/docs/api-reference/text-to-speech/convert) | 144 | 209 | 287 | | |
| | `API` [OpenAI TTS-1](https://platform.openai.com/docs/guides/text-to-speech) | 37 | 55 | 82 | | |
| | `API` [Gemini 2.5 Flash TTS](https://ai.google.dev/gemini-api/docs/speech-generation) | 12 | 18 | 24 | | |
| | `API` [Supertone Sona speech 1](https://docs.supertoneapi.com/en/api-reference/endpoints/text-to-speech) | 38 | 64 | 92 | | |
| | `Open` [Kokoro](https://github.com/hexgrad/kokoro/) | 104 | 107 | 117 | | |
| | `Open` [NeuTTS Air](https://github.com/neuphonic/neutts-air) | 37 | 42 | 47 | | |
| > **Notes:** | |
| > `API` = Cloud-based API services (measured from Seoul) | |
| > `Open` = Open-source models | |
| > Supertonic (M4 pro - CPU) and (M4 pro - WebGPU): Tested with ONNX | |
| > Supertonic (RTX4090): Tested with PyTorch model | |
| > Kokoro: Tested on M4 Pro CPU with ONNX | |
| > NeuTTS Air: Tested on M4 Pro CPU with Q8-GGUF | |
| ### Real-time Factor | |
| | System | Short (59 chars) | Mid (152 chars) | Long (266 chars) | | |
| |--------|-----------------|----------------|-----------------| | |
| | **Supertonic** (M4 pro - CPU) | 0.015 | 0.013 | 0.012 | | |
| | **Supertonic** (M4 pro - WebGPU) | 0.014 | 0.007 | 0.006 | | |
| | **Supertonic** (RTX4090) | 0.005 | 0.002 | 0.001 | | |
| | `API` [ElevenLabs Flash v2.5](https://elevenlabs.io/docs/api-reference/text-to-speech/convert) | 0.133 | 0.077 | 0.057 | | |
| | `API` [OpenAI TTS-1](https://platform.openai.com/docs/guides/text-to-speech) | 0.471 | 0.302 | 0.201 | | |
| | `API` [Gemini 2.5 Flash TTS](https://ai.google.dev/gemini-api/docs/speech-generation) | 1.060 | 0.673 | 0.541 | | |
| | `API` [Supertone Sona speech 1](https://docs.supertoneapi.com/en/api-reference/endpoints/text-to-speech) | 0.372 | 0.206 | 0.163 | | |
| | `Open` [Kokoro](https://github.com/hexgrad/kokoro/) | 0.144 | 0.124 | 0.126 | | |
| | `Open` [NeuTTS Air](https://github.com/neuphonic/neutts-air) | 0.390 | 0.338 | 0.343 | | |
| <details> | |
| <summary><b>Additional Performance Data (5-step inference)</b></summary> | |
| <br> | |
| **Characters per Second (5-step)** | |
| | System | Short (59 chars) | Mid (152 chars) | Long (266 chars) | | |
| |--------|-----------------|----------------|-----------------| | |
| | **Supertonic** (M4 pro - CPU) | 596 | 691 | 850 | | |
| | **Supertonic** (M4 pro - WebGPU) | 570 | 1118 | 1546 | | |
| | **Supertonic** (RTX4090) | 1286 | 3757 | 6242 | | |
| **Real-time Factor (5-step)** | |
| | System | Short (59 chars) | Mid (152 chars) | Long (266 chars) | | |
| |--------|-----------------|----------------|-----------------| | |
| | **Supertonic** (M4 pro - CPU) | 0.023 | 0.019 | 0.018 | | |
| | **Supertonic** (M4 pro - WebGPU) | 0.024 | 0.012 | 0.010 | | |
| | **Supertonic** (RTX4090) | 0.011 | 0.004 | 0.002 | | |
| </details> | |
| ## License | |
| This project’s sample code is released under the MIT License. - see the [LICENSE](https://github.com/supertone-inc/supertonic?tab=MIT-1-ov-file) for details. | |
| The accompanying model is released under the OpenRAIL-M License. - see the [LICENSE](https://huggingface.co/Supertone/supertonic-2/blob/main/LICENSE) file for details. | |
| This model was trained using PyTorch, which is licensed under the BSD 3-Clause License but is not redistributed with this project. - see the [LICENSE](https://docs.pytorch.org/FBGEMM/general/License.html) for details. | |
| Copyright (c) 2026 Supertone Inc. |