LWD — Learning When to Denoise

EMA weights for "Learning When to Denoise: Optimizing Asynchronous Schedules for Latent Diffusion."

These are the EMA weights of the LightningDiT-XL/1 (675M-parameter) denoiser trained with our learned asynchronous semantic–texture schedule on class-conditional ImageNet 256×256.

Checkpoints

File Training budget Unguided FID AutoGuidance FID
xl_400k.pt 400K iter (≈80 epochs) 2.87 1.14
xl_1m.pt 1M iter (≈200 epochs) 2.37 1.05
xl_3m.pt 3M iter (≈600 epochs) 2.14 1.02

Each file is a slim checkpoint of the form {'ema': state_dict} and is drop-in for the inference script in the code repository.

Usage

from huggingface_hub import hf_hub_download

ckpt_path = hf_hub_download("bsq532087/LWD", "xl_3m.pt")
# then point the code repo's inference config / --ckpt at `ckpt_path`

The texture latent decoder (SD-VAE f16-d32) and the SemVAE semantic encoder are inherited from SFD / LightningDiT; see the code repository for how to obtain them.

License & attribution

Released under the MIT License. The denoiser backbone derives from LightningDiT and the semantic-first latent setup / SemVAE encoder from SFD; please also respect the licenses of those projects.

Citation

@article{qian2026learning,
  title   = {Learning When to Denoise: Optimizing Asynchronous Schedules for Latent Diffusion},
  author  = {Qian, Bingshuo and Cheng, Xiang},
  journal = {arXiv preprint arXiv:2606.19662},
  year    = {2026},
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for bsq532087/LWD