Sona Forge β€” SD 1.5 ControlNet Canny (ONNX FP16)

ONNX FP16 export of the SD 1.5 ControlNet Canny encoder. Used by the Sona Forge Android app for pose / composition stability on portrait avatars (Phase 7). Pair with sona-forge/sd15-ipadapter-fp16 (residual-accepting variant, revision β‰₯ 1.1.0) and sona-forge/clip-vit-h-14-image-fp16.

ONNX shape

Input Shape dtype Notes
sample [batch, 4, 64, 64] FP16 latent state at step t
timestep [batch] FP16 scheduler timestep
encoder_hidden_states [batch, 77, 768] FP16 text embeds
canny_image [batch, 3, 512, 512] FP16 Canny edge map in [0, 1] (white-on-black, replicated 3Γ—). For CFG, pass zeros for the uncond branch. ControlNet's residual contribution is linear in canny_image, so on-device callers can pre-multiply this by a controlNetScale ∈ [0, 1] factor instead of carrying a scalar input.
Output Shape dtype Notes
down_residual_0..11 12 tensors FP16 down-block residuals fed into the SD 1.5 UNet's skip connections
mid_residual [batch, 1280, 8, 8] FP16 mid-block residual

Down-block residual canonical shapes (per SD 1.5 UNet): [batch, 320, 64, 64] Γ—3, [batch, 320, 32, 32], [batch, 640, 32, 32] Γ—2, [batch, 640, 16, 16], [batch, 1280, 16, 16] Γ—2, [batch, 1280, 8, 8] Γ—3.

How it was made

Pinned conversion environment:

Package Version
diffusers 0.27.2
transformers 4.40.0
torch 2.3.0
onnx 1.16.0
onnxruntime 1.18.0
numpy <2 (ABI compat)

Conversion sequence:

  1. Load lllyasviel/control_v11p_sd15_canny ControlNet model at FP16.
  2. Wrap to expose 13 named outputs (down_residual_0..11, mid_residual).
  3. torch.onnx.export at opset 17 with FP16 dummy inputs at canonical SD 1.5 shapes.

Re-running the conversion from the same pinned environment produces byte-identical output (same sha256). Conversion artefacts include a spike report with full validation metrics and arithmetic round-trip checks against the PyTorch reference.

Files

File Size sha256
model.onnx 723,055,316 B (689.6 MB) 399358929322eb5bb2f0e141e23486397a29ec871e4efa625c0f2ba4d418c698

No external-data sidecar β€” graph + weights fit under the 2 GB protobuf single-file limit.

Licence

CreativeML OpenRAIL-M β€” matches the upstream ControlNet weights (lllyasviel/control_v11p_sd15_canny).

Memory footprint

ORT CPU EP promotes FP16 to FP32 at session load (~1.4 GB resident). On Android, NNAPI / XNNPack execute FP16 natively and the on-device working set is closer to the FP16 disk size + activation buffers. Sona Forge gates this pack to Tier C devices (β‰₯ 11 GB total RAM) per RamGate.requiresControlNetTier.

Usage

import onnxruntime as ort
import numpy as np

session = ort.InferenceSession("model.onnx", providers=["CPUExecutionProvider"])

# CFG batch=2.
sample = np.random.randn(2, 4, 64, 64).astype(np.float16)
timestep = np.array([999.0, 999.0], dtype=np.float16)
encoder_hidden_states = np.random.randn(2, 77, 768).astype(np.float16)

# canny_image: zeros for the uncond branch, scaled edges for the cond branch.
canny_one = np.random.rand(1, 3, 512, 512).astype(np.float16)  # white-on-black, 3-channel replicated, [0..1]
controlnet_scale = 0.7
canny_image = np.concatenate([
    np.zeros_like(canny_one),
    canny_one * controlnet_scale,
], axis=0)

residuals = session.run(None, {
    "sample": sample,
    "timestep": timestep,
    "encoder_hidden_states": encoder_hidden_states,
    "canny_image": canny_image,
})
# 12 down-block residuals + 1 mid-block residual, fed into the residual-accepting IP-Adapter UNet.

Provenance

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for sona-forge/sd15-controlnet-canny-fp16