LWD — Learning When to Denoise
EMA weights for "Learning When to Denoise: Optimizing Asynchronous Schedules for Latent Diffusion."
- 📄 Paper: https://arxiv.org/abs/2606.19662
- 💻 Code: https://github.com/bsq532087/LWD
These are the EMA weights of the LightningDiT-XL/1 (675M-parameter) denoiser trained with our learned asynchronous semantic–texture schedule on class-conditional ImageNet 256×256.
Checkpoints
| File | Training budget | Unguided FID | AutoGuidance FID |
|---|---|---|---|
xl_400k.pt |
400K iter (≈80 epochs) | 2.87 | 1.14 |
xl_1m.pt |
1M iter (≈200 epochs) | 2.37 | 1.05 |
xl_3m.pt |
3M iter (≈600 epochs) | 2.14 | 1.02 |
Each file is a slim checkpoint of the form {'ema': state_dict} and is drop-in
for the inference script in the code repository.
Usage
from huggingface_hub import hf_hub_download
ckpt_path = hf_hub_download("bsq532087/LWD", "xl_3m.pt")
# then point the code repo's inference config / --ckpt at `ckpt_path`
The texture latent decoder (SD-VAE f16-d32) and the SemVAE semantic encoder are inherited from SFD / LightningDiT; see the code repository for how to obtain them.
License & attribution
Released under the MIT License. The denoiser backbone derives from LightningDiT and the semantic-first latent setup / SemVAE encoder from SFD; please also respect the licenses of those projects.
Citation
@article{qian2026learning,
title = {Learning When to Denoise: Optimizing Asynchronous Schedules for Latent Diffusion},
author = {Qian, Bingshuo and Cheng, Xiang},
journal = {arXiv preprint arXiv:2606.19662},
year = {2026},
}