SD 1.5 + ControlNet-Canny β Precompiled QNN ONNX (Snapdragon 8 Gen 3)
Mirror of Qualcomm's published precompiled QNN ONNX bundle for SD 1.5
- ControlNet-Canny, built for the Hexagon V75 NPU on Snapdragon 8 Gen 3. Used by Sona Forge β a local-first Android image-generation app β to run SD 1.5 inference entirely on the device's NPU instead of the CPU.
This repository is a redistribution of artefacts that Qualcomm publishes
under their own model cards on aihub.qualcomm.com.
We do not retrain, refine, or otherwise alter the weights. The mirror exists
so the Sona Forge Android app has a stable, version-pinned download URL
under our org's namespace; upstream URLs at qaihub-public-assets.s3.amazonaws.com
are tied to a release version (currently v0.52.0) that may rotate.
Files
| File | Size | Purpose |
|---|---|---|
text_encoder.onnx |
733 B | EPContext wrapper, references text_encoder_qairt_context.bin |
text_encoder_qairt_context.bin |
156 MB | QAIRT 2.42 HTP context binary, w8a16 |
unet.onnx |
9.4 KB | EPContext wrapper, references unet_qairt_context.bin |
unet_qairt_context.bin |
841 MB | UNet w/ 13 ControlNet residual inputs, w8a16 |
controlnet.onnx |
7.4 KB | EPContext wrapper, references controlnet_qairt_context.bin |
controlnet_qairt_context.bin |
352 MB | ControlNet-Canny encoder, w8a16 |
vae.onnx |
873 B | EPContext wrapper, references vae_qairt_context.bin |
vae_qairt_context.bin |
62 MB | VAE decoder, w8a16 |
metadata.json |
17 KB | Input/output shapes + quantization scale/zero-point per tensor |
Total: 1.4 GB on disk.
Loading via ONNX Runtime QNN EP
The .onnx files are tiny EPContext wrappers β they reference the matching
*_qairt_context.bin and tell ORT's QNN Execution Provider to dispatch
inference to the Hexagon V75 backend. Both files of a pair must live
side-by-side at load time.
val options = OrtSession.SessionOptions().apply {
addQnn(mapOf("backend_path" to "libQnnHtp.so", "htp_arch" to "75"))
}
val unet = env.createSession("unet.onnx", options)
Inputs / outputs are uint16 quantized (NHWC)
Unlike Sona Forge's CPU-EP packs (sd15-controlnet-canny-fp16), these
take uint16 quantized tensors in NHWC layout, with per-tensor
scale and zero_point in metadata.json. Host code must:
- Quantize FP32/FP16 inputs to uint16 using the metadata's scale + zp.
- Permute NCHW β NHWC before feeding (
[1, 4, 64, 64]β[1, 64, 64, 4]). - Reverse for outputs (uint16 β FP32, NHWC β NCHW).
metadata.json enumerates every input/output shape, dtype, and
quantization parameters. See the Qualcomm AI Hub
ControlNet-Canny model card
for the full pre/post-processing reference.
Provenance
Stable Diffusion v1.5 weights
CompVis (https://github.com/CompVis/stable-diffusion)
CreativeML Open RAIL-M
β
βΌ
ControlNet-Canny adapter
lllyasviel (https://github.com/lllyasviel/ControlNet)
Apache 2.0
β
βΌ
Quantization (w8a16) + QAIRT 2.42 HTP compilation for V75
Qualcomm AI Hub (https://aihub.qualcomm.com)
Use restrictions per Qualcomm AI Hub Terms β see LICENSE / ATTRIBUTION.md
β
βΌ
This repository
sona-forge mirror, no further modifications
Tooling versions baked in
- QAIRT:
2.42.0.251225135753_193295 - ONNX Runtime:
1.24.3 - Quantization:
w8a16(8-bit weights, 16-bit activations) - Target SoC: Snapdragon 8 Gen 3 β Hexagon V75 only
These artefacts will not load on:
- Hexagon V73 (8 Gen 2 / 7 Gen 4) β needs a
_8gen2variant - Hexagon V79 (8 Elite / 8 Elite Gen 5) β needs a
_8elitevariant - Any non-Qualcomm SoC
For other targets, mirror the matching zip from
qaihub-public-assets.s3.us-west-2.amazonaws.com or recompile via the
Qualcomm AI Hub Python API.
License
See LICENSE and ATTRIBUTION.md. The bundle is governed by the union of:
- CreativeML Open RAIL-M (SD 1.5 base)
- Apache 2.0 (ControlNet adapter)
- Qualcomm AI Hub Terms of Service (compiled artefacts)
Use restrictions from the upstream model cards apply, including (but not limited to) prohibitions on biometric identification, social scoring, generation of CSAM, and harassment. Read the upstream cards before deploying.