SDXL-Lightning 4-step (ONNX, fp16, ORT-Web compatible)

ONNX export of ByteDance/SDXL-Lightning 4-step UNet merged into stabilityai/stable-diffusion-xl-base-1.0, with the VAE decoder replaced by madebyollin/sdxl-vae-fp16-fix for fp16 stability. Layout is a diffusers-style pipeline with per-subfolder ONNX files and bundled tokenizers, intended for in-browser inference via ONNX Runtime Web (WebGPU EP).

Subfolder	Size	Notes
`text_encoder/`	236 MB	CLIP-L/14, fp16
`text_encoder_2/`	1.3 GB	CLIP-G/14 (OpenCLIP-ViT-bigG-14), fp16
`unet/`	4.8 GB	SDXL-Lightning 4-step UNet, fp16
`vae_decoder/`	95 MB	sdxl-vae-fp16-fix decoder, fp16
`tokenizer/`, `tokenizer_2/`	small	Fast tokenizers (`tokenizer.json` present)
`scheduler/`	small	EulerDiscrete, `timestep_spacing='trailing'`
`model_index.json`	small	Diffusers pipeline manifest

Total: ~6.5 GB.

Recommended usage

Designed for 4 denoising steps, classifier-free guidance disabled (CFG = 1.0). Guidance > 1 breaks Lightning. Recommended scheduler is the bundled EulerDiscreteScheduler with timestep_spacing="trailing".

The export targets ORT-Web's WebGPU EP. The CPU EP passes a sanity check locally; the WebGPU EP is the production target and tolerates a stricter op set. If a runtime op error appears in the browser, downgrade the opset or rebuild the export against a different toolchain version.

Production notes

Resize ops are kept at fp32 with auto-inserted casts at the boundary — onnxconverter-common's default block list catches most cases, but the scales Constant input was hand-patched back to fp32 after the converter failed to insert casts for Constant-produced inputs (2 in UNet, 3 in VAE).
I/O dtypes are fp32 throughout (keep_io_types=True) so JavaScript callers can feed unconverted fp32 tensors and read fp32 outputs.
The vae_encoder/ from the original optimum export was dropped — Lightning is text-to-image only.

Licenses

This is a derivative work combining three upstream sources, each with its own license. All three are permissive but you should read them before commercial use.

SDXL-Lightning UNet — ByteDance/SDXL-Lightning is licensed under CreativeML Open RAIL++-M.
SDXL base-1.0 (everything except the UNet weights) — stabilityai/stable-diffusion-xl-base-1.0 is licensed under CreativeML Open RAIL++-M.
VAE decoder — madebyollin/sdxl-vae-fp16-fix is MIT-licensed.

The combined work is released under CreativeML Open RAIL++-M (the more restrictive of the upstream licenses).

How it was built

Reproduction recipe (CPU-only Windows box):

Construct the SDXL UNet from stabilityai/stable-diffusion-xl-base-1.0 config and load sdxl_lightning_4step_unet.safetensors from ByteDance/SDXL-Lightning into it.
Save the merged pipeline as a full diffusers pipeline.
optimum-cli export onnx --task stable-diffusion-xl --framework pt → per-subfolder ONNX at fp32 (~13 GB).
Convert UNet + both text encoders to fp16 in place via onnxconverter-common.float16.convert_float_to_float16 with a custom post-pass that reverts Resize-feeding Constants back to fp32.
Replace the original VAE decoder with a fresh ONNX export of madebyollin/sdxl-vae-fp16-fix, fp16-converted with the same post-pass.
Build fast tokenizers (tokenizer.json) from the slow-tokenizer files optimum-cli dropped, since transformers.js v3 has no slow-tokenizer fallback.

Built with torch==2.4.1+cpu, optimum[exporters]==1.23.3, transformers==4.45.2, diffusers==0.30.3, onnx==1.17.0, onnxruntime==1.20.1, onnxconverter-common==1.14.0.

Downloads last month: -; Downloads are not tracked for this model. How to track

Cyronius
/

sdxl-lightning-4step-onnx-web

SDXL-Lightning 4-step (ONNX, fp16, ORT-Web compatible)

Contents

Recommended usage

Production notes

Licenses

How it was built