SD 1.5 + ControlNet-Canny β€” Precompiled QNN ONNX (Snapdragon 8 Gen 3)

Mirror of Qualcomm's published precompiled QNN ONNX bundle for SD 1.5

  • ControlNet-Canny, built for the Hexagon V75 NPU on Snapdragon 8 Gen 3. Used by Sona Forge β€” a local-first Android image-generation app β€” to run SD 1.5 inference entirely on the device's NPU instead of the CPU.

This repository is a redistribution of artefacts that Qualcomm publishes under their own model cards on aihub.qualcomm.com. We do not retrain, refine, or otherwise alter the weights. The mirror exists so the Sona Forge Android app has a stable, version-pinned download URL under our org's namespace; upstream URLs at qaihub-public-assets.s3.amazonaws.com are tied to a release version (currently v0.52.0) that may rotate.

Files

File Size Purpose
text_encoder.onnx 733 B EPContext wrapper, references text_encoder_qairt_context.bin
text_encoder_qairt_context.bin 156 MB QAIRT 2.42 HTP context binary, w8a16
unet.onnx 9.4 KB EPContext wrapper, references unet_qairt_context.bin
unet_qairt_context.bin 841 MB UNet w/ 13 ControlNet residual inputs, w8a16
controlnet.onnx 7.4 KB EPContext wrapper, references controlnet_qairt_context.bin
controlnet_qairt_context.bin 352 MB ControlNet-Canny encoder, w8a16
vae.onnx 873 B EPContext wrapper, references vae_qairt_context.bin
vae_qairt_context.bin 62 MB VAE decoder, w8a16
metadata.json 17 KB Input/output shapes + quantization scale/zero-point per tensor

Total: 1.4 GB on disk.

Loading via ONNX Runtime QNN EP

The .onnx files are tiny EPContext wrappers β€” they reference the matching *_qairt_context.bin and tell ORT's QNN Execution Provider to dispatch inference to the Hexagon V75 backend. Both files of a pair must live side-by-side at load time.

val options = OrtSession.SessionOptions().apply {
    addQnn(mapOf("backend_path" to "libQnnHtp.so", "htp_arch" to "75"))
}
val unet = env.createSession("unet.onnx", options)

Inputs / outputs are uint16 quantized (NHWC)

Unlike Sona Forge's CPU-EP packs (sd15-controlnet-canny-fp16), these take uint16 quantized tensors in NHWC layout, with per-tensor scale and zero_point in metadata.json. Host code must:

  1. Quantize FP32/FP16 inputs to uint16 using the metadata's scale + zp.
  2. Permute NCHW β†’ NHWC before feeding ([1, 4, 64, 64] β†’ [1, 64, 64, 4]).
  3. Reverse for outputs (uint16 β†’ FP32, NHWC β†’ NCHW).

metadata.json enumerates every input/output shape, dtype, and quantization parameters. See the Qualcomm AI Hub ControlNet-Canny model card for the full pre/post-processing reference.

Provenance

Stable Diffusion v1.5 weights
  CompVis (https://github.com/CompVis/stable-diffusion)
  CreativeML Open RAIL-M
        β”‚
        β–Ό
ControlNet-Canny adapter
  lllyasviel (https://github.com/lllyasviel/ControlNet)
  Apache 2.0
        β”‚
        β–Ό
Quantization (w8a16) + QAIRT 2.42 HTP compilation for V75
  Qualcomm AI Hub (https://aihub.qualcomm.com)
  Use restrictions per Qualcomm AI Hub Terms β€” see LICENSE / ATTRIBUTION.md
        β”‚
        β–Ό
This repository
  sona-forge mirror, no further modifications

Tooling versions baked in

  • QAIRT: 2.42.0.251225135753_193295
  • ONNX Runtime: 1.24.3
  • Quantization: w8a16 (8-bit weights, 16-bit activations)
  • Target SoC: Snapdragon 8 Gen 3 β€” Hexagon V75 only

These artefacts will not load on:

  • Hexagon V73 (8 Gen 2 / 7 Gen 4) β€” needs a _8gen2 variant
  • Hexagon V79 (8 Elite / 8 Elite Gen 5) β€” needs a _8elite variant
  • Any non-Qualcomm SoC

For other targets, mirror the matching zip from qaihub-public-assets.s3.us-west-2.amazonaws.com or recompile via the Qualcomm AI Hub Python API.

License

See LICENSE and ATTRIBUTION.md. The bundle is governed by the union of:

  1. CreativeML Open RAIL-M (SD 1.5 base)
  2. Apache 2.0 (ControlNet adapter)
  3. Qualcomm AI Hub Terms of Service (compiled artefacts)

Use restrictions from the upstream model cards apply, including (but not limited to) prohibitions on biometric identification, social scoring, generation of CSAM, and harassment. Read the upstream cards before deploying.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support