SDXL-Lightning 4-step (ONNX, fp16, ORT-Web compatible)

ONNX export of ByteDance/SDXL-Lightning 4-step UNet merged into stabilityai/stable-diffusion-xl-base-1.0, with the VAE decoder replaced by madebyollin/sdxl-vae-fp16-fix for fp16 stability. Layout is a diffusers-style pipeline with per-subfolder ONNX files and bundled tokenizers, intended for in-browser inference via ONNX Runtime Web (WebGPU EP).

Contents

Subfolder Size Notes
text_encoder/ 236 MB CLIP-L/14, fp16
text_encoder_2/ 1.3 GB CLIP-G/14 (OpenCLIP-ViT-bigG-14), fp16
unet/ 4.8 GB SDXL-Lightning 4-step UNet, fp16
vae_decoder/ 95 MB sdxl-vae-fp16-fix decoder, fp16
tokenizer/, tokenizer_2/ small Fast tokenizers (tokenizer.json present)
scheduler/ small EulerDiscrete, timestep_spacing='trailing'
model_index.json small Diffusers pipeline manifest

Total: ~6.5 GB.

Recommended usage

Designed for 4 denoising steps, classifier-free guidance disabled (CFG = 1.0). Guidance > 1 breaks Lightning. Recommended scheduler is the bundled EulerDiscreteScheduler with timestep_spacing="trailing".

The export targets ORT-Web's WebGPU EP. The CPU EP passes a sanity check locally; the WebGPU EP is the production target and tolerates a stricter op set. If a runtime op error appears in the browser, downgrade the opset or rebuild the export against a different toolchain version.

Production notes

  • Resize ops are kept at fp32 with auto-inserted casts at the boundary โ€” onnxconverter-common's default block list catches most cases, but the scales Constant input was hand-patched back to fp32 after the converter failed to insert casts for Constant-produced inputs (2 in UNet, 3 in VAE).
  • I/O dtypes are fp32 throughout (keep_io_types=True) so JavaScript callers can feed unconverted fp32 tensors and read fp32 outputs.
  • The vae_encoder/ from the original optimum export was dropped โ€” Lightning is text-to-image only.

Licenses

This is a derivative work combining three upstream sources, each with its own license. All three are permissive but you should read them before commercial use.

The combined work is released under CreativeML Open RAIL++-M (the more restrictive of the upstream licenses).

How it was built

Reproduction recipe (CPU-only Windows box):

  1. Construct the SDXL UNet from stabilityai/stable-diffusion-xl-base-1.0 config and load sdxl_lightning_4step_unet.safetensors from ByteDance/SDXL-Lightning into it.
  2. Save the merged pipeline as a full diffusers pipeline.
  3. optimum-cli export onnx --task stable-diffusion-xl --framework pt โ†’ per-subfolder ONNX at fp32 (~13 GB).
  4. Convert UNet + both text encoders to fp16 in place via onnxconverter-common.float16.convert_float_to_float16 with a custom post-pass that reverts Resize-feeding Constants back to fp32.
  5. Replace the original VAE decoder with a fresh ONNX export of madebyollin/sdxl-vae-fp16-fix, fp16-converted with the same post-pass.
  6. Build fast tokenizers (tokenizer.json) from the slow-tokenizer files optimum-cli dropped, since transformers.js v3 has no slow-tokenizer fallback.

Built with torch==2.4.1+cpu, optimum[exporters]==1.23.3, transformers==4.45.2, diffusers==0.30.3, onnx==1.17.0, onnxruntime==1.20.1, onnxconverter-common==1.14.0.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support