Sona Forge β SD 1.5 ControlNet Canny (ONNX FP16)
ONNX FP16 export of the SD 1.5 ControlNet Canny encoder. Used by the Sona Forge Android app for pose / composition stability on portrait avatars (Phase 7). Pair with sona-forge/sd15-ipadapter-fp16 (residual-accepting variant, revision β₯ 1.1.0) and sona-forge/clip-vit-h-14-image-fp16.
ONNX shape
| Input | Shape | dtype | Notes |
|---|---|---|---|
sample |
[batch, 4, 64, 64] |
FP16 | latent state at step t |
timestep |
[batch] |
FP16 | scheduler timestep |
encoder_hidden_states |
[batch, 77, 768] |
FP16 | text embeds |
canny_image |
[batch, 3, 512, 512] |
FP16 | Canny edge map in [0, 1] (white-on-black, replicated 3Γ). For CFG, pass zeros for the uncond branch. ControlNet's residual contribution is linear in canny_image, so on-device callers can pre-multiply this by a controlNetScale β [0, 1] factor instead of carrying a scalar input. |
| Output | Shape | dtype | Notes |
|---|---|---|---|
down_residual_0..11 |
12 tensors | FP16 | down-block residuals fed into the SD 1.5 UNet's skip connections |
mid_residual |
[batch, 1280, 8, 8] |
FP16 | mid-block residual |
Down-block residual canonical shapes (per SD 1.5 UNet):
[batch, 320, 64, 64] Γ3, [batch, 320, 32, 32], [batch, 640, 32, 32] Γ2, [batch, 640, 16, 16], [batch, 1280, 16, 16] Γ2, [batch, 1280, 8, 8] Γ3.
How it was made
Pinned conversion environment:
| Package | Version |
|---|---|
| diffusers | 0.27.2 |
| transformers | 4.40.0 |
| torch | 2.3.0 |
| onnx | 1.16.0 |
| onnxruntime | 1.18.0 |
| numpy | <2 (ABI compat) |
Conversion sequence:
- Load
lllyasviel/control_v11p_sd15_cannyControlNet model at FP16. - Wrap to expose 13 named outputs (
down_residual_0..11,mid_residual). torch.onnx.exportat opset 17 with FP16 dummy inputs at canonical SD 1.5 shapes.
Re-running the conversion from the same pinned environment produces byte-identical output (same sha256). Conversion artefacts include a spike report with full validation metrics and arithmetic round-trip checks against the PyTorch reference.
Files
| File | Size | sha256 |
|---|---|---|
model.onnx |
723,055,316 B (689.6 MB) | 399358929322eb5bb2f0e141e23486397a29ec871e4efa625c0f2ba4d418c698 |
No external-data sidecar β graph + weights fit under the 2 GB protobuf single-file limit.
Licence
CreativeML OpenRAIL-M β matches the upstream ControlNet weights (lllyasviel/control_v11p_sd15_canny).
Memory footprint
ORT CPU EP promotes FP16 to FP32 at session load (~1.4 GB resident). On Android, NNAPI / XNNPack execute FP16 natively and the on-device working set is closer to the FP16 disk size + activation buffers. Sona Forge gates this pack to Tier C devices (β₯ 11 GB total RAM) per RamGate.requiresControlNetTier.
Usage
import onnxruntime as ort
import numpy as np
session = ort.InferenceSession("model.onnx", providers=["CPUExecutionProvider"])
# CFG batch=2.
sample = np.random.randn(2, 4, 64, 64).astype(np.float16)
timestep = np.array([999.0, 999.0], dtype=np.float16)
encoder_hidden_states = np.random.randn(2, 77, 768).astype(np.float16)
# canny_image: zeros for the uncond branch, scaled edges for the cond branch.
canny_one = np.random.rand(1, 3, 512, 512).astype(np.float16) # white-on-black, 3-channel replicated, [0..1]
controlnet_scale = 0.7
canny_image = np.concatenate([
np.zeros_like(canny_one),
canny_one * controlnet_scale,
], axis=0)
residuals = session.run(None, {
"sample": sample,
"timestep": timestep,
"encoder_hidden_states": encoder_hidden_states,
"canny_image": canny_image,
})
# 12 down-block residuals + 1 mid-block residual, fed into the residual-accepting IP-Adapter UNet.
Provenance
- Original ControlNet weights:
lllyasviel/control_v11p_sd15_canny. - Companion SD 1.5 UNet (residual-accepting variant):
sona-forge/sd15-ipadapter-fp16(revision β₯ 1.1.0 supports the 13-residual signature).
Model tree for sona-forge/sd15-controlnet-canny-fp16
Base model
runwayml/stable-diffusion-v1-5