ZoomLDM-naip / README.md

BiliSakura

Add files using upload-large-folder tool

810e785 verified about 2 months ago

preview code

raw

history blame contribute delete

3.23 kB

metadata

license: apache-2.0
library_name: diffusers
pipeline_tag: image-to-image
tags:
  - zoomldm
  - remote-sensing
  - naip
  - latent-diffusion
  - custom-pipeline
  - arxiv:2411.16969
widget:
  - src: demo_images/input.jpeg
    prompt: NAIP sample conditioned on demo SSL feature (mag=0)
    output:
      url: demo_images/output.jpeg

we do not have a full checkpoint conversion validation, if you encounter pipeline loading failure and unsidered output, please contact me via bili_sakura@zju.edu.cn

BiliSakura/ZoomLDM-naip

Diffusers-format NAIP variant of ZoomLDM with a bundled custom pipeline and local ldm modules.

Known Issue

Current NAIP generations may look incorrect (BRCA-like) even with valid NAIP demo inputs.
Root cause: the upstream raw checkpoints currently available for naip and brca are byte-identical, so conversion reproduces the same model weights.
This repo will be updated once a distinct NAIP checkpoint is available.

Model Description

Architecture: ZoomLDM latent diffusion pipeline (UNet + VAE + conditioning encoder)
Domain: Remote sensing imagery (NAIP)
Conditioning: DINOv2-style SSL feature maps + magnification level (0..4)
Format: Self-contained local folder for DiffusionPipeline.from_pretrained(...)

Intended Use

Use this model for conditional multi-scale NAIP image generation when you already have pre-extracted SSL features.

Out-of-Scope Use

Not intended for clinical or safety-critical decisions.
Not a general-purpose text-to-image model.
Performance may degrade under domain shift or mismatched feature extractors.

Files

unet/, vae/, conditioning_encoder/, scheduler/
model_index.json
pipeline_zoomldm.py
ldm/ (bundled dependency modules)

Usage

import torch
from diffusers import DiffusionPipeline

pipe = DiffusionPipeline.from_pretrained(
    "BiliSakura/ZoomLDM-naip",
    custom_pipeline="pipeline_zoomldm.py",
    trust_remote_code=True,
).to("cuda")

out = pipe(
    ssl_features=ssl_feat_tensor.to("cuda"),      # (B, 1024, H, W), typically H=W=4
    magnification=torch.tensor([0]).to("cuda"),   # 0..4
    num_inference_steps=50,
    guidance_scale=2.0,
)
images = out.images

Demo Generation (dataset-backed)

This repo includes run_demo_inference.py, which uses local repo assets only:

image: demo_images/input.jpeg
SSL feature: demo_data/0_ssl_feat.npy
magnification label: 2 (3x level)

Run:

python run_demo_inference.py

Limitations

Requires correctly precomputed NAIP conditioning features.
Magnification conditioning must match expected integer codes.
Outputs can contain dataset artifacts or biases inherited from training data.

Citation

@InProceedings{Yellapragada_2025_CVPR,
  author = {Yellapragada, Srikar and Graikos, Alexandros and Triaridis, Kostas and Prasanna, Prateek and Gupta, Rajarsi and Saltz, Joel and Samaras, Dimitris},
  title = {ZoomLDM: Latent Diffusion Model for Multi-scale Image Generation},
  booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR)},
  month = {June},
  year = {2025},
  pages = {23453-23463}
}