Add files using upload-large-folder tool

c161111 verified about 1 month ago

5.06 kB

	---
	license: apache-2.0
	library_name: diffusers
	pipeline_tag: image-to-image
	tags:
	- controlnet
	- remote-sensing
	- arxiv:2404.06637
	widget:
	# GeoSynth-Location-OSM: OSM tile -> satellite image (default lon=-90.2, lat=38.6)
	- src: demo_images/GeoSynth-Location-OSM/input.jpeg
	prompt: Satellite image features a city neighborhood
	output:
	url: demo_images/GeoSynth-Location-OSM/output.jpeg
	# GeoSynth-Location-Canny: Canny edges -> satellite image (default lon=-90.2, lat=38.6)
	- src: demo_images/GeoSynth-Location-Canny/input.jpeg
	prompt: Satellite image features a city neighborhood
	output:
	url: demo_images/GeoSynth-Location-Canny/output.jpeg
	---

	> [!WARNING] we do not have a full checkpoint conversion validation, if you encounter pipeline loading failure and unsidered output, please contact me via bili_sakura@zju.edu.cn

	# GeoSynth-ControlNets-Location

	Repository for location-conditioned GeoSynth ControlNets.

	Location (lon/lat) conditioning is the primary workflow for geo-aware synthesis. Default: St. Louis, MO (`lon=-90.2`, `lat=38.6`). Use SatCLIP + CoordNet for full diffusers-style location conditioning.

	We maintain two repositories—one per base checkpoint—each with its compatible ControlNets:

	\| Repo \| Base Model \| ControlNets \|
	\|------\|------------\|-------------\|
	\| [GeoSynth-ControlNets](https://huggingface.co/BiliSakura/GeoSynth-ControlNets) \| GeoSynth (text encoder & UNet same as SD 2.1) \| GeoSynth-OSM, GeoSynth-Canny, GeoSynth-SAM \|
	\| This repo \| GeoSynth-Location (adds CoordNet branch) \| GeoSynth-Location-OSM, GeoSynth-Location-SAM*, GeoSynth-Location-Canny \|

	[GeoSynth-Location-SAM](https://huggingface.co/MVRL/GeoSynth-Location-SAM) controlnet ckpt is missing from source.

	### This repository

	1. GeoSynth-Location base — Converted from `geosynth_sd_loc-v3.ckpt` to diffusers format. Text encoder and UNet are the same as SD 2.1 (not fine-tuned). The original checkpoint also includes a CoordNet branch for `[lon, lat]` conditioning (see Architecture).
	2. ControlNet models — GeoSynth-Location-OSM, GeoSynth-Location-Canny (converted from SD-style checkpoints under `MVRL/GeoSynth-Location-OSM` and `MVRL/GeoSynth-Location-Canny`), and GeoSynth-Location-SAM, located under [`controlnet/`](controlnet/).

	### Architecture

	The full location pipeline adds a CoordNet branch to the base LDM:

	- Input: `[lon, lat]` → SatCLIP location encoder → CoordNet (13 stacked cross-attention blocks, inner dim 256, 4 heads) → conditioning injected into UNet
	- ControlNet and CoordNet jointly condition the UNet (see [GeoSynth paper](https://huggingface.co/papers/2404.06637) Figure 3)

	### ControlNet variants (this repo)

	\| Control \| Subfolder \| Status \|
	\|---------\|-----------\|--------\|
	\| OSM \| `controlnet/GeoSynth-Location-OSM` \| ✅ ready \|
	\| Canny \| `controlnet/GeoSynth-Location-Canny` \| ✅ ready \|
	\| SAM \| `controlnet/GeoSynth-Location-SAM` \| ⏳ ckpt pending \|

	### Model Sources

	- Source: [GeoSynth](https://github.com/mvrl/GeoSynth)
	- Paper: [GeoSynth: Contextually-Aware High-Resolution Satellite Image Synthesis](https://huggingface.co/papers/2404.06637)
	- Base model: [Stable Diffusion 2.1](https://huggingface.co/sd2-community/stable-diffusion-2-1-base)
	- Related: [GeoSynth-ControlNets](https://huggingface.co/BiliSakura/GeoSynth-ControlNets) (non-location models)

	## Usage

	CLI:
	```bash
	python inference_demo.py --control demo_images/GeoSynth-Location-OSM/input.jpeg --control_type OSM --lon -90.2 --lat 38.6
	```

	Python:
	```python
	import sys, os
	sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))

	import torch
	from PIL import Image
	from geosynth_pipeline import load_geosynth_pipeline_with_location, run_with_location

	pipe = load_geosynth_pipeline_with_location(".", controlnet_subfolder="controlnet/GeoSynth-Location-OSM", local_files_only=True)
	pipe = pipe.to("cuda")

	img = Image.open("demo_images/GeoSynth-Location-OSM/input.jpeg").convert("RGB").resize((512, 512))
	output = run_with_location(pipe, "Satellite image features a city neighborhood", image=img, lon=-90.2, lat=38.6)
	output.images[0].save("generated_city.jpg")
	```

	## Citation

	If you use this model, please cite the GeoSynth paper. For location-conditioned variants, also cite SatCLIP.

	```bibtex
	@inproceedings{sastry2024geosynth,
	title={GeoSynth: Contextually-Aware High-Resolution Satellite Image Synthesis},
	author={Sastry, Srikumar and Khanal, Subash and Dhakal, Aayush and Jacobs, Nathan},
	booktitle={IEEE/ISPRS Workshop: Large Scale Computer Vision for Remote Sensing (EARTHVISION)},
	year={2024}
	}

	@article{klemmer2025satclip,
	title={{SatCLIP}: {Global}, General-Purpose Location Embeddings with Satellite Imagery},
	author={Klemmer, Konstantin and Rolf, Esther and Robinson, Caleb and Mackey, Lester and Ru{\ss}wurm, Marc},
	journal={Proceedings of the AAAI Conference on Artificial Intelligence},
	volume={39},
	number={4},
	pages={4347--4355},
	year={2025},
	doi={10.1609/aaai.v39i4.32457}
	}
	```