Add files using upload-large-folder tool

09a04e6 verified 16 days ago

6.92 kB

	---
	license: apache-2.0
	library_name: diffusers
	pipeline_tag: image-to-image
	tags:
	- controlnet
	- remote-sensing
	- arxiv:2404.06637
	widget:
	# GeoSynth-OSM: OSM tile -> satellite image
	- src: demo_images/GeoSynth-OSM/input.jpeg
	prompt: Satellite image features a city neighborhood
	output:
	url: demo_images/GeoSynth-OSM/output.jpeg
	# GeoSynth-Canny: Canny edges -> satellite image
	- src: demo_images/GeoSynth-Canny/input.jpeg
	prompt: Satellite image features a city neighborhood
	output:
	url: demo_images/GeoSynth-Canny/output.jpeg
	# GeoSynth-SAM: SAM segmentation -> satellite image
	- src: demo_images/GeoSynth-SAM/input.jpeg
	prompt: Satellite image features a city neighborhood
	output:
	url: demo_images/GeoSynth-SAM/output.jpeg
	---

	> [!WARNING] we do not have a full checkpoint conversion validation, if you encounter pipeline loading failure and unsidered output, please contact me via bili_sakura@zju.edu.cn

	# GeoSynth-ControlNets

	We maintain two repositories—one per base checkpoint—each with its compatible ControlNets:

	\| Repo \| Base Model \| ControlNets \|
	\|------\|------------\|-------------\|
	\| This repo \| GeoSynth (text encoder & UNet same as SD 2.1) \| GeoSynth-OSM, GeoSynth-Canny, GeoSynth-SAM \|
	\| [GeoSynth-ControlNets-Location](https://huggingface.co/BiliSakura/GeoSynth-ControlNets-Location) \| GeoSynth-Location (adds CoordNet branch) \| GeoSynth-Location-OSM, GeoSynth-Location-SAM*, GeoSynth-Location-Canny \|

	[GeoSynth-Location-SAM](https://huggingface.co/MVRL/GeoSynth-Location-SAM) controlnet ckpt is missing from source.

	### This repository

	1. GeoSynth checkpoint — A remote sensing visual generative model. The text encoder and UNet are the same as [Stable Diffusion 2.1](https://huggingface.co/sd2-community/stable-diffusion-2-1-base) (not fine-tuned).
	2. ControlNet models — OSM, Canny, and SAM conditioning, located under [`controlnet/`](controlnet/).

	### Architecture note: location-conditioned models

	Location-conditioned variants (GeoSynth-Location-) use a different base checkpoint* that adds a CoordNet branch. The branch takes `[lon, lat]` as input, passes it through a SatCLIP location encoder, then through a CoordNet (13 stacked cross-attention blocks, inner dim 256, 4 heads). ControlNet and CoordNet both condition the UNet. See the [GeoSynth paper](https://huggingface.co/papers/2404.06637) Figure 3.

	### ControlNet variants (this repo)

	\| Control \| Subfolder \| Status \|
	\|---------\|-----------\|--------\|
	\| OSM \| `controlnet/GeoSynth-OSM` \| ✅ Integrated \|
	\| Canny \| `controlnet/GeoSynth-Canny` \| ✅ Integrated \|
	\| SAM \| `controlnet/GeoSynth-SAM` \| ✅ Integrated \|

	Use it with 🧨 [diffusers](#examples) or the [Stable Diffusion](https://github.com/Stability-AI/stablediffusion) repository.

	### Model Sources

	- Source: [GeoSynth](https://github.com/mvrl/GeoSynth)
	- Paper: [GeoSynth: Contextually-Aware High-Resolution Satellite Image Synthesis](https://huggingface.co/papers/2404.06637)
	- Base model: [Stable Diffusion 2.1](https://huggingface.co/sd2-community/stable-diffusion-2-1-base)



	## Examples

	### Text-to-Image (base GeoSynth)

	```python
	from diffusers import StableDiffusionPipeline

	pipe = StableDiffusionPipeline.from_pretrained("BiliSakura/GeoSynth-ControlNets")
	pipe = pipe.to("cuda")

	image = pipe("Satellite image features a city neighborhood").images[0]
	image.save("generated_city.jpg")
	```

	### ControlNet (diffusers integration)

	Use the 🧨 diffusers `ControlNetModel` wrapper with `StableDiffusionControlNetPipeline`:

	GeoSynth-OSM — synthesizes satellite images from OpenStreetMap tiles (RGB):

	```python
	from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
	from PIL import Image
	import torch

	controlnet = ControlNetModel.from_pretrained(
	"BiliSakura/GeoSynth-ControlNets",
	subfolder="controlnet/GeoSynth-OSM",
	)
	pipe = StableDiffusionControlNetPipeline.from_pretrained(
	"BiliSakura/GeoSynth-ControlNets",
	controlnet=controlnet,
	)
	pipe = pipe.to("cuda")
	img = Image.open("osm_tile.jpeg") # OSM tile (RGB, 512x512)
	generator = torch.manual_seed(42)
	image = pipe("Satellite image features a city neighborhood", image=img, generator=generator, num_inference_steps=20).images[0]
	image.save("generated_city.jpg")
	```

	GeoSynth-Canny — synthesizes satellite images from Canny edge maps:

	```python
	from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
	from PIL import Image
	import torch

	controlnet = ControlNetModel.from_pretrained(
	"BiliSakura/GeoSynth-ControlNets",
	subfolder="controlnet/GeoSynth-Canny",
	)
	pipe = StableDiffusionControlNetPipeline.from_pretrained(
	"BiliSakura/GeoSynth-ControlNets",
	controlnet=controlnet,
	)
	pipe = pipe.to("cuda")
	img = Image.open("canny_edges.jpeg") # Canny edge image (RGB, 512x512)
	generator = torch.manual_seed(42)
	image = pipe("Satellite image features a city neighborhood", image=img, generator=generator, num_inference_steps=20).images[0]
	image.save("generated_city.jpg")
	```

	GeoSynth-SAM — synthesizes satellite images from SAM (Segment Anything Model) segmentation masks:

	```python
	from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
	from PIL import Image
	import torch

	controlnet = ControlNetModel.from_pretrained(
	"BiliSakura/GeoSynth-ControlNets",
	subfolder="controlnet/GeoSynth-SAM",
	)
	pipe = StableDiffusionControlNetPipeline.from_pretrained(
	"BiliSakura/GeoSynth-ControlNets",
	controlnet=controlnet,
	)
	pipe = pipe.to("cuda")
	img = Image.open("sam_segmentation.jpeg") # SAM mask (RGB, 512x512)
	generator = torch.manual_seed(42)
	image = pipe("Satellite image features a city neighborhood", image=img, generator=generator, num_inference_steps=20).images[0]
	image.save("generated_city.jpg")
	```

	For location-conditioned variants (GeoSynth-Location-OSM, GeoSynth-Location-SAM, GeoSynth-Location-Canny), see the separate [GeoSynth-ControlNets-Location](https://huggingface.co/BiliSakura/GeoSynth-ControlNets-Location) repo.

	## Citation

	If you use this model, please cite the GeoSynth paper. For location-conditioned variants, also cite SatCLIP.

	```bibtex
	@inproceedings{sastry2024geosynth,
	title={GeoSynth: Contextually-Aware High-Resolution Satellite Image Synthesis},
	author={Sastry, Srikumar and Khanal, Subash and Dhakal, Aayush and Jacobs, Nathan},
	booktitle={IEEE/ISPRS Workshop: Large Scale Computer Vision for Remote Sensing (EARTHVISION)},
	year={2024}
	}

	@article{klemmer2025satclip,
	title={{SatCLIP}: {Global}, General-Purpose Location Embeddings with Satellite Imagery},
	author={Klemmer, Konstantin and Rolf, Esther and Robinson, Caleb and Mackey, Lester and Ru{\ss}wurm, Marc},
	journal={Proceedings of the AAAI Conference on Artificial Intelligence},
	volume={39},
	number={4},
	pages={4347--4355},
	year={2025},
	doi={10.1609/aaai.v39i4.32457}
	}
	```