Instructions to use Kyumdroid/supertonic-3-quant with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Supertonic
How to use Kyumdroid/supertonic-3-quant with Supertonic:
from supertonic import TTS tts = TTS(auto_download=True) style = tts.get_voice_style(voice_name="M1") text = "The train delay was announced at 4:45 PM on Wed, Apr 3, 2024 due to track maintenance." wav, duration = tts.synthesize(text, voice_style=style) tts.save_audio(wav, "output.wav")
- Notebooks
- Google Colab
- Kaggle
Supertonic-3 Quantized (ONNX)
Quantized ONNX derivative of Supertone/supertonic-3 for on-device TTS. Drop-in replacement for the official ONNX assets โ same Python / C++ / Node SDK, smaller weights.
31 languages (en, ko, ja, ar, bg, cs, da, de, el, es, et, fi, fr, hi, hr, hu, id, it, lt, lv, nl, pl, pt, ro, ru, sk, sl, sv, tr, uk, vi).
Variants
| Folder | Total size | Method | Quality | Use case |
|---|---|---|---|---|
fp16/ |
191 MB | All 4 models float16 (onnxruntime.transformers.float16) |
โ99% of fp32 | On-device desktop/mobile, ORT/CoreML/DirectML |
voice_styles/ is shared and unchanged from upstream.
Why no int8 variant?
Tested dynamic int8 on vector_estimator (the largest model, a ConvNeXt-based diffusion U-Net) but the resulting model emits ConvInteger op nodes, which are not implemented in many ORT CPU builds:
- Common error:
NOT_IMPLEMENTED: Could not find an implementation for ConvInteger(10) node - Affects:
onnxruntime-node, minimal builds, older ORT versions, some mobile builds
Restricting dynamic quantization to MatMul ops (skipping Conv) gives only ~6% size reduction because vector_estimator is Conv-dominated. Static int8 (QDQ) with calibration would work universally but requires capturing intermediate diffusion states โ out of scope for this repo.
For now, fp16 is the recommended on-device variant: universal ORT compatibility, near-lossless quality, ~50% smaller than fp32.
Layout
fp16/onnx/
text_encoder.onnx
duration_predictor.onnx
vector_estimator.onnx
vocoder.onnx
tts.json
unicode_indexer.json
voice_styles/
{F1,F2,F3,F4,F5,M1,M2,M3,M4,M5}.json
fp16/onnx/โ 4 ONNX weights + architecture config (tts.json) + tokenizer table (unicode_indexer.json).voice_styles/โ voice embeddings, identical to upstream.
Download
hf download Kyumdroid/supertonic-3-quant \
--include="fp16/onnx/**" --include="voice_styles/**" \
--local-dir ./supertonic
Voice catalog
Display names from the official Supertonic demo Space:
| File | Name | Description |
|---|---|---|
M1.json |
Alex | Lively, upbeat male |
M2.json |
James | Deep, composed male |
M3.json |
Robert | Polished, authoritative male (demo default) |
M4.json |
Sam | Soft, neutral, youthful male |
M5.json |
Daniel | Warm, soothing male |
F1.json |
Sarah | Calm, steady female |
F2.json |
Lily | Bright, cheerful female |
F3.json |
Jessica | Broadcast-style female |
F4.json |
Olivia | Crisp, confident female |
F5.json |
Emily | Gentle, soothing female |
Conversion
fp16/ was produced via onnxruntime.transformers.float16.convert_float_to_float16 with:
keep_io_types=True(fp32 IO for SDK compatibility)op_block_list=['Cast'](avoid Cast type mismatch)- ONNX
shape_inference.infer_shapes_pathapplied to upstream fp32 first
Conversion script available in the project repository.
Performance (Apple Silicon CPU)
Short Korean utterance, ORT CPU EP only:
| Variant | Size | Synthesis time |
|---|---|---|
| fp32 baseline (upstream) | 380 MB | ~0.7 s |
| fp16 | 191 MB | ~0.7 s |
CPU EP performs fp16 as fp32 upcast, so wall-clock time is similar. Use CoreML EP (macOS) or DirectML EP (Windows) for fp16-native acceleration: 2-3ร faster + ~50% lower RAM.
License
OpenRAIL-M, inherited from Supertone/supertonic-3. See LICENSE.
Use restrictions (Attachment A) apply: no impersonation/deepfakes without consent, no AI-generated content without disclosure, no medical advice, no illegal activities, etc.
Credits
- Original model: Supertone/supertonic-3 by Supertone Inc.
- Quantization (this repo): fp16 ONNX for Electron / desktop on-device deployment
Model tree for Kyumdroid/supertonic-3-quant
Base model
Supertone/supertonic-3