Generative AI for Urban Design: A Stepwise Approach Integrating Human Expertise with Multimodal Diffusion Models
Paper โข 2505.24260 โข Published
ControlNet-based diffusion models for automatic urban design generation, conditioned on site constraints and text descriptions.
Paper: Human-guided urban form generation using multimodal diffusion models, Building and Environment, 2026
Full paper; Arxiv; Code & documentation: GitHub
Six checkpoints covering two cities ร three pipeline steps:
| Checkpoint | City | Step |
|---|---|---|
checkpoints_step1_nyc |
New York City | Site constraints โ Land use + road network |
checkpoints_step1_chi |
Chicago | Site constraints โ Land use + road network |
checkpoints_step2_nyc |
New York City | Land use + roads โ Building footprint layout |
checkpoints_step2_chi |
Chicago | Land use + roads โ Building footprint layout |
checkpoints_step3_nyc |
New York City | Building footprints โ Satellite image |
checkpoints_step3_chi |
Chicago | Building footprints โ Satellite image |
Fine-tuned from runwayml/stable-diffusion-v1-5 + ControlNet. Checkpoints are FP16, ~2.9 GB each.
@article{he2025human,
title = {Human-guided urban form generation using multimodal diffusion models},
author = {He, Mingyi and Liang, Yuebing and Wang, Shenhao and Zheng, Yunhan
and Wang, Qingyi and Zhuang, Dingyi and Tian, Li and Zhao, Jinhua},
journal = {Building and Environment},
pages = {113892},
year = {2025},
doi = {10.1016/j.buildenv.2025.113892}
}
@article{he2025generative,
title = {Generative {AI} for urban design: a stepwise approach integrating
human expertise with multimodal diffusion models},
author = {He, Mingyi and Liang, Yuebing and Wang, Shenhao and Zheng, Yunhan
and Wang, Qingyi and Zhuang, Dingyi and Tian, Li and Zhao, Jinhua},
journal = {arXiv preprint arXiv:2505.24260},
year = {2025}
}