metadata
license: apache-2.0
library_name: diffusers
pipeline_tag: image-to-image
tags:
- zoomldm
- remote-sensing
- naip
- latent-diffusion
- custom-pipeline
- arxiv:2411.16969
widget:
- src: demo_images/input.jpeg
prompt: NAIP sample conditioned on demo SSL feature (mag=0)
output:
url: demo_images/output.jpeg
we do not have a full checkpoint conversion validation, if you encounter pipeline loading failure and unsidered output, please contact me via bili_sakura@zju.edu.cn
BiliSakura/ZoomLDM-naip
Diffusers-format NAIP variant of ZoomLDM with a bundled custom pipeline and local ldm modules.
Known Issue
- Current NAIP generations may look incorrect (BRCA-like) even with valid NAIP demo inputs.
- Root cause: the upstream raw checkpoints currently available for
naipandbrcaare byte-identical, so conversion reproduces the same model weights. - This repo will be updated once a distinct NAIP checkpoint is available.
Model Description
- Architecture: ZoomLDM latent diffusion pipeline (
UNet + VAE + conditioning encoder) - Domain: Remote sensing imagery (NAIP)
- Conditioning: DINOv2-style SSL feature maps + magnification level (
0..4) - Format: Self-contained local folder for
DiffusionPipeline.from_pretrained(...)
Intended Use
Use this model for conditional multi-scale NAIP image generation when you already have pre-extracted SSL features.
Out-of-Scope Use
- Not intended for clinical or safety-critical decisions.
- Not a general-purpose text-to-image model.
- Performance may degrade under domain shift or mismatched feature extractors.
Files
unet/,vae/,conditioning_encoder/,scheduler/model_index.jsonpipeline_zoomldm.pyldm/(bundled dependency modules)
Usage
import torch
from diffusers import DiffusionPipeline
pipe = DiffusionPipeline.from_pretrained(
"BiliSakura/ZoomLDM-naip",
custom_pipeline="pipeline_zoomldm.py",
trust_remote_code=True,
).to("cuda")
out = pipe(
ssl_features=ssl_feat_tensor.to("cuda"), # (B, 1024, H, W), typically H=W=4
magnification=torch.tensor([0]).to("cuda"), # 0..4
num_inference_steps=50,
guidance_scale=2.0,
)
images = out.images
Demo Generation (dataset-backed)
This repo includes run_demo_inference.py, which uses local repo assets only:
- image:
demo_images/input.jpeg - SSL feature:
demo_data/0_ssl_feat.npy - magnification label:
2(3x level)
Run:
python run_demo_inference.py
Limitations
- Requires correctly precomputed NAIP conditioning features.
- Magnification conditioning must match expected integer codes.
- Outputs can contain dataset artifacts or biases inherited from training data.
Citation
@InProceedings{Yellapragada_2025_CVPR,
author = {Yellapragada, Srikar and Graikos, Alexandros and Triaridis, Kostas and Prasanna, Prateek and Gupta, Rajarsi and Saltz, Joel and Samaras, Dimitris},
title = {ZoomLDM: Latent Diffusion Model for Multi-scale Image Generation},
booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR)},
month = {June},
year = {2025},
pages = {23453-23463}
}