Supertonic-3 Quantized (ONNX)

Quantized ONNX derivative of Supertone/supertonic-3 for on-device TTS. Drop-in replacement for the official ONNX assets — same Python / C++ / Node SDK, smaller weights.

31 languages (en, ko, ja, ar, bg, cs, da, de, el, es, et, fi, fr, hi, hr, hu, id, it, lt, lv, nl, pl, pt, ro, ru, sk, sl, sv, tr, uk, vi).

Variants

Folder	Total size	Method	Quality	Use case
`fp16/`	191 MB	All 4 models float16 (`onnxruntime.transformers.float16`)	≈99% of fp32	On-device desktop/mobile, ORT/CoreML/DirectML

voice_styles/ is shared and unchanged from upstream.

Why no int8 variant?

Tested dynamic int8 on vector_estimator (the largest model, a ConvNeXt-based diffusion U-Net) but the resulting model emits ConvInteger op nodes, which are not implemented in many ORT CPU builds:

Common error: NOT_IMPLEMENTED: Could not find an implementation for ConvInteger(10) node
Affects: onnxruntime-node, minimal builds, older ORT versions, some mobile builds

Restricting dynamic quantization to MatMul ops (skipping Conv) gives only ~6% size reduction because vector_estimator is Conv-dominated. Static int8 (QDQ) with calibration would work universally but requires capturing intermediate diffusion states — out of scope for this repo.

For now, fp16 is the recommended on-device variant: universal ORT compatibility, near-lossless quality, ~50% smaller than fp32.

Layout

fp16/onnx/
    text_encoder.onnx
    duration_predictor.onnx
    vector_estimator.onnx
    vocoder.onnx
    tts.json
    unicode_indexer.json

voice_styles/
    {F1,F2,F3,F4,F5,M1,M2,M3,M4,M5}.json

fp16/onnx/ — 4 ONNX weights + architecture config (tts.json) + tokenizer table (unicode_indexer.json).
voice_styles/ — voice embeddings, identical to upstream.

Download

hf download Kyumdroid/supertonic-3-quant \
  --include="fp16/onnx/**" --include="voice_styles/**" \
  --local-dir ./supertonic

Voice catalog

Display names from the official Supertonic demo Space:

File	Name	Description
`M1.json`	Alex	Lively, upbeat male
`M2.json`	James	Deep, composed male
`M3.json`	Robert	Polished, authoritative male (demo default)
`M4.json`	Sam	Soft, neutral, youthful male
`M5.json`	Daniel	Warm, soothing male
`F1.json`	Sarah	Calm, steady female
`F2.json`	Lily	Bright, cheerful female
`F3.json`	Jessica	Broadcast-style female
`F4.json`	Olivia	Crisp, confident female
`F5.json`	Emily	Gentle, soothing female

Conversion

fp16/ was produced via onnxruntime.transformers.float16.convert_float_to_float16 with:

keep_io_types=True (fp32 IO for SDK compatibility)
op_block_list=['Cast'] (avoid Cast type mismatch)
ONNX shape_inference.infer_shapes_path applied to upstream fp32 first

Conversion script available in the project repository.

Performance (Apple Silicon CPU)

Short Korean utterance, ORT CPU EP only:

Variant	Size	Synthesis time
fp32 baseline (upstream)	380 MB	~0.7 s
fp16	191 MB	~0.7 s

CPU EP performs fp16 as fp32 upcast, so wall-clock time is similar. Use CoreML EP (macOS) or DirectML EP (Windows) for fp16-native acceleration: 2-3× faster + ~50% lower RAM.

License

OpenRAIL-M, inherited from Supertone/supertonic-3. See LICENSE.

Use restrictions (Attachment A) apply: no impersonation/deepfakes without consent, no AI-generated content without disclosure, no medical advice, no illegal activities, etc.

Credits

Original model: Supertone/supertonic-3 by Supertone Inc.
Quantization (this repo): fp16 ONNX for Electron / desktop on-device deployment

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for Kyumdroid/supertonic-3-quant

Base model

Supertone/supertonic-3

Quantized

(5)

this model