Shadow0482
/

supertonic-quantized

Text-to-Speech

ONNX

Model card Files Files and versions

xet

Community

Shadow0482 commited on Nov 23, 2025

Commit

16dbb03

verified ·

1 Parent(s): faa0459

Upload README.md with huggingface_hub

Browse files

Files changed (1) hide show

README.md +84 -0

README.md ADDED Viewed

	@@ -0,0 +1,84 @@

+# Supertonic – FP16 vs INT8 Quantized Benchmark (by Shadow0482)
+This README documents a simple benchmark comparing **FP16** and **INT8** quantized
+versions of the [Supertonic](https://huggingface.co/Supertone/supertonic) TTS
+pipeline using the quantized models hosted at:
+- Quantized models repo: **Shadow0482/supertonic-quantized**
+- This model card: **Shadow0482/supertonic-quantized**
+All tests were run in Google Colab on CPU using the official `py/example_onnx.py`
+script from the Supertonic GitHub repository.
+---
+## Test text
+The same text was used for both FP16 and INT8 runs:
+> Greetings! You are listening to your newly quantized model. I have been squished, squeezed, compressed, minimized, optimized, digitized, and lightly traumatized to save disk space. The testing framework automatically verifies my integrity, measures how much weight I lost, and checks if I can still talk without glitching into a robot dolphin. If you can hear this clearly, the quantization ritual was a complete success.
+---
+## Results
+| Variant | Precision | ONNX directory        | Time (s) | Output WAV                      | Status |
+|--------:|-----------|-----------------------|---------:|----------------------------------|--------|
+| FP16    | float16   | `onnx_fp16/`          | 0.290 | `NONE` | FAILED |
+| INT8    | int8      | `onnx_int8/`          | 0.246 | `NONE` | FAILED |
+> Note:
+> - Exact times will vary depending on Colab hardware, runtime load, and ONNX Runtime version.
+> - The goal of this benchmark is to confirm that both FP16 and INT8 quantized models
+>   load correctly and produce intelligible audio for the same input text.
+---
+## How to reproduce this benchmark
+In a fresh Colab notebook:
+1. Install dependencies and run the benchmark cell (the one that created this README).
+2. Make sure you have write access to both:
+   - `Shadow0482/supertonic`
+   - `Shadow0482/supertonic-quantized`
+The core of the benchmark is simply:
+```bash
+python py/example_onnx.py \
+  --onnx-dir onnx_fp16 \
+  --voice-style assets/voice_styles/M1.json \
+  --text "... your test text ..." \
+  --n-test 1 \
+  --save-dir results_fp16
+python py/example_onnx.py \
+  --onnx-dir onnx_int8 \
+  --voice-style assets/voice_styles/M1.json \
+  --text "... the same test text ..." \
+  --n-test 1 \
+  --save-dir results_int8
+````
+Where `onnx_fp16/` and `onnx_int8/` contain drop-in copies of the original
+Supertonic ONNX files, but converted/quantized to FP16 or INT8 respectively.
+---
+## Model notes
+* **FP16** models are converted from the original FP32 weights using float16 conversion.
+* **INT8** models are dynamically quantized (MatMul/Gemm) using ONNX Runtime.
+* The quantized models live in `{QUANT_REPO_ID}` and can be plugged into the
+  Supertonic pipeline via the `--onnx-dir` argument in `example_onnx.py`.
+---
+## License
+The original Supertonic code and models are licensed under their respective
+licenses (MIT code + OpenRAIL-M model). This benchmark and quantized packaging
+follow the same licensing terms.