Sona Forge — SD 1.5 ControlNet Canny (ONNX FP16)

ONNX FP16 export of the SD 1.5 ControlNet Canny encoder. Used by the Sona Forge Android app for pose / composition stability on portrait avatars (Phase 7). Pair with sona-forge/sd15-ipadapter-fp16 (residual-accepting variant, revision ≥ 1.1.0) and sona-forge/clip-vit-h-14-image-fp16.

ONNX shape

Input	Shape	dtype	Notes
`sample`	`[batch, 4, 64, 64]`	FP16	latent state at step t
`timestep`	`[batch]`	FP16	scheduler timestep
`encoder_hidden_states`	`[batch, 77, 768]`	FP16	text embeds
`canny_image`	`[batch, 3, 512, 512]`	FP16	Canny edge map in `[0, 1]` (white-on-black, replicated 3×). For CFG, pass zeros for the uncond branch. ControlNet's residual contribution is linear in `canny_image`, so on-device callers can pre-multiply this by a `controlNetScale ∈ [0, 1]` factor instead of carrying a scalar input.

Output	Shape	dtype	Notes
`down_residual_0..11`	12 tensors	FP16	down-block residuals fed into the SD 1.5 UNet's skip connections
`mid_residual`	`[batch, 1280, 8, 8]`	FP16	mid-block residual

Down-block residual canonical shapes (per SD 1.5 UNet): [batch, 320, 64, 64] ×3, [batch, 320, 32, 32], [batch, 640, 32, 32] ×2, [batch, 640, 16, 16], [batch, 1280, 16, 16] ×2, [batch, 1280, 8, 8] ×3.

How it was made

Pinned conversion environment:

Package	Version
diffusers	0.27.2
transformers	4.40.0
torch	2.3.0
onnx	1.16.0
onnxruntime	1.18.0
numpy	<2 (ABI compat)

Conversion sequence:

Load lllyasviel/control_v11p_sd15_canny ControlNet model at FP16.
Wrap to expose 13 named outputs (down_residual_0..11, mid_residual).
torch.onnx.export at opset 17 with FP16 dummy inputs at canonical SD 1.5 shapes.

Re-running the conversion from the same pinned environment produces byte-identical output (same sha256). Conversion artefacts include a spike report with full validation metrics and arithmetic round-trip checks against the PyTorch reference.

Files

File	Size	sha256
`model.onnx`	723,055,316 B (689.6 MB)	`399358929322eb5bb2f0e141e23486397a29ec871e4efa625c0f2ba4d418c698`

No external-data sidecar — graph + weights fit under the 2 GB protobuf single-file limit.

Licence

CreativeML OpenRAIL-M — matches the upstream ControlNet weights (lllyasviel/control_v11p_sd15_canny).

Memory footprint

ORT CPU EP promotes FP16 to FP32 at session load (~1.4 GB resident). On Android, NNAPI / XNNPack execute FP16 natively and the on-device working set is closer to the FP16 disk size + activation buffers. Sona Forge gates this pack to Tier C devices (≥ 11 GB total RAM) per RamGate.requiresControlNetTier.

Usage

import onnxruntime as ort
import numpy as np

session = ort.InferenceSession("model.onnx", providers=["CPUExecutionProvider"])

# CFG batch=2.
sample = np.random.randn(2, 4, 64, 64).astype(np.float16)
timestep = np.array([999.0, 999.0], dtype=np.float16)
encoder_hidden_states = np.random.randn(2, 77, 768).astype(np.float16)

# canny_image: zeros for the uncond branch, scaled edges for the cond branch.
canny_one = np.random.rand(1, 3, 512, 512).astype(np.float16)  # white-on-black, 3-channel replicated, [0..1]
controlnet_scale = 0.7
canny_image = np.concatenate([
    np.zeros_like(canny_one),
    canny_one * controlnet_scale,
], axis=0)

residuals = session.run(None, {
    "sample": sample,
    "timestep": timestep,
    "encoder_hidden_states": encoder_hidden_states,
    "canny_image": canny_image,
})
# 12 down-block residuals + 1 mid-block residual, fed into the residual-accepting IP-Adapter UNet.

Provenance

Original ControlNet weights: lllyasviel/control_v11p_sd15_canny.
Companion SD 1.5 UNet (residual-accepting variant): sona-forge/sd15-ipadapter-fp16 (revision ≥ 1.1.0 supports the 13-residual signature).

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for sona-forge/sd15-controlnet-canny-fp16

Base model

runwayml/stable-diffusion-v1-5

Adapter

lllyasviel/control_v11p_sd15_canny

Adapter

(3)

this model