Add quantized QNN model libraries for QCS6490

Compiled .so libraries for HTP backend deployment:
- libtext_encoder_htp.so
- libvector_estimator_htp.so
- libvocoder_htp.so
- libduration_predictor_htp.so

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Files changed (6) hide show

.gitattributes +1 -0
README.md +149 -8
libduration_predictor_htp.so +3 -0
libtext_encoder_htp.so +3 -0
libvector_estimator_htp.so +3 -0
libvocoder_htp.so +3 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+*.so filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -1,11 +1,152 @@
 ---
 license: openrail
-base_model:
-- Supertone/supertonic
-language:
-- en
-pipeline_tag: text-to-speech
 tags:
-- qualcomm
-- LPAI
----

 ---
 license: openrail
+base_model: Supertone/supertonic-2
 tags:
+  - tts
+  - text-to-speech
+  - qualcomm
+  - qnn
+  - quantized
+  - qcs6490
+  - hexagon
+pipeline_tag: text-to-speech
+---
+# Supertonic TTS Quantization for QCS6490
+A step-by-step guide to quantize the [Supertonic TTS](https://huggingface.co/Supertone/supertonic) model for Qualcomm QCS6490 using QAIRT/QNN.
+## Requirements
+- QAIRT/QNN SDK **v2.37**
+- Python 3.8+
+- Target device: **QCS6490**
+## Pipeline Architecture
+```text
+                text + style
+                     │
+         ┌───────────┴───────────┐
+         │                       │
+  duration_predictor        text_encoder
+         │                       │
+    duration (scalar)       text_emb (1,128,256)
+         │                       │
+   latent_mask (1,1,256)         │
+         └───────────┬───────────┘
+                     │
+              vector_estimator (10 diffusion steps)
+                     │
+               denoised_latent
+                     │
+                  vocoder
+                     │
+              audio (44.1kHz)
+```
+The `duration_predictor` outputs a single scalar representing the total speech duration. This is post-processed into a `latent_mask` that tells the `vector_estimator` how many of the 256 fixed-size latent frames are active speech vs padding.
+## Workflow
+### 1. Input Preparation
+Prepare calibration inputs for model quantization.
+`Input_Preparation.ipynb`
+### 2. Step-by-Step Quantization
+Convert ONNX models to QNN format with quantization for HTP backend.
+`Supertonic_TTS_StepbyStep.ipynb`
+### 3. Correlation Verification
+Verify quantized model outputs against reference using cosine similarity.
+`Correlation_Verification.ipynb`
+## Project Structure
+```text
+├── Input_Preparation.ipynb         # Prepare calibration inputs
+├── Supertonic_TTS_StepbyStep.ipynb # ONNX → QNN quantization guide
+├── Correlation_Verification.ipynb  # Output verification
+├── assets/                         # ONNX models (git submodule)
+│   └── onnx/
+│       ├── text_encoder.onnx
+│       ├── duration_predictor.onnx
+│       ├── vector_estimator.onnx
+│       └── vocoder.onnx
+├── QNN_Models/                     # Quantized QNN models (.bin, .cpp)
+├── QNN_Model_lib/                  # QNN runtime libraries (aarch64)
+├── qnn_calibration/                # Calibration data for verification
+├── inputs/                         # Prepared input data
+└── board_output/                   # Inference outputs from board
+```
+## Models
+| Model              | Description                                 |
+|--------------------|---------------------------------------------|
+| text_encoder       | Encodes text tokens with style embedding    |
+| duration_predictor | Predicts phoneme durations                  |
+| vector_estimator   | Diffusion-based latent generator (10 steps) |
+| vocoder            | Converts latent to audio waveform           |
+### ONNX Models (Source)
+Located in `assets/onnx/` (git submodule from Hugging Face):
+- `text_encoder.onnx`
+- `duration_predictor.onnx`
+- `vector_estimator.onnx`
+- `vocoder.onnx`
+### QNN Models (Quantized)
+Located in `QNN_Models/`:
+- `text_encoder_htp.bin` / `.cpp`
+- `vector_estimator_htp.bin` / `.cpp`
+- `vocoder_htp.bin` / `.cpp`
+### Compiled Libraries (Ready for Deployment)
+Located in `QNN_Model_lib/aarch64-oe-linux-gcc11.2/`:
+- `libtext_encoder_htp.so`
+- `libvector_estimator_htp.so`
+- `libvocoder_htp.so`
+- `libduration_predictor_htp.so`
+These `.so` files are compiled from the `.cpp` sources and are ready to be deployed (via SCP) to the board for inference.
+> **Note:** The `duration_predictor` is quantized and compiled but not used in the current calibration-based workflow since `latent_mask` is precomputed. For an end-to-end pipeline with arbitrary text input, the duration predictor must run first to dynamically generate the `latent_mask`.
+## Getting Started
+1. Clone with submodules:
+   ```bash
+   git clone --recurse-submodules https://github.com/dev-ansh-r/Supertonic-TTS-QCS6490
+   ```
+2. Follow the notebooks in order:
+   - `Input_Preparation.ipynb`
+   - `Supertonic_TTS_StepbyStep.ipynb`
+   - `Correlation_Verification.ipynb`
+## Note
+> Inference script and sample application are not provided. Optimization work is ongoing and will be released soon.
+## License
+This model inherits the licensing from [Supertone/supertonic-2](https://huggingface.co/Supertone/supertonic-2):
+- **Model:** OpenRAIL-M License
+- **Code:** MIT License
+Copyright (c) 2026 Supertone Inc. (original model)

libduration_predictor_htp.so ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0b581701a8b4a10cbca55d428fe94de005dc8e6322e4de596228b45da5d2bee6
+size 1027296

libtext_encoder_htp.so ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f8f8d69ce4c134f340ddf4bee8a2f2be3e43f47e55c446dfe68a2d46d19b5d8d
+size 7819640

libvector_estimator_htp.so ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7ea703c8e37afd08f37d27d975ac110b1b20c78fa4d467d5e1c7c89f5ec1c036
+size 34901904

libvocoder_htp.so ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f1aba5afbd21acb5fe69ef7b048a3b794ee410907458b9180470067e604796db
+size 25864496