File size: 6,918 Bytes
2e8a26c 09a04e6 2e8a26c 09a04e6 2e8a26c 09a04e6 2e8a26c 09a04e6 2e8a26c | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 | ---
license: apache-2.0
library_name: diffusers
pipeline_tag: image-to-image
tags:
- controlnet
- remote-sensing
- arxiv:2404.06637
widget:
# GeoSynth-OSM: OSM tile -> satellite image
- src: demo_images/GeoSynth-OSM/input.jpeg
prompt: Satellite image features a city neighborhood
output:
url: demo_images/GeoSynth-OSM/output.jpeg
# GeoSynth-Canny: Canny edges -> satellite image
- src: demo_images/GeoSynth-Canny/input.jpeg
prompt: Satellite image features a city neighborhood
output:
url: demo_images/GeoSynth-Canny/output.jpeg
# GeoSynth-SAM: SAM segmentation -> satellite image
- src: demo_images/GeoSynth-SAM/input.jpeg
prompt: Satellite image features a city neighborhood
output:
url: demo_images/GeoSynth-SAM/output.jpeg
---
> [!WARNING] we do not have a full checkpoint conversion validation, if you encounter pipeline loading failure and unsidered output, please contact me via bili_sakura@zju.edu.cn
# GeoSynth-ControlNets
We maintain **two repositories**—one per base checkpoint—each with its compatible ControlNets:
| Repo | Base Model | ControlNets |
|------|------------|-------------|
| **This repo** | GeoSynth (text encoder & UNet same as SD 2.1) | GeoSynth-OSM, GeoSynth-Canny, GeoSynth-SAM |
| **[GeoSynth-ControlNets-Location](https://huggingface.co/BiliSakura/GeoSynth-ControlNets-Location)** | GeoSynth-Location (adds CoordNet branch) | GeoSynth-Location-OSM, GeoSynth-Location-SAM*, GeoSynth-Location-Canny |
*[GeoSynth-Location-SAM](https://huggingface.co/MVRL/GeoSynth-Location-SAM) controlnet ckpt is missing from source.*
### This repository
1. **GeoSynth checkpoint** — A remote sensing visual generative model. The text encoder and UNet are the same as [Stable Diffusion 2.1](https://huggingface.co/sd2-community/stable-diffusion-2-1-base) (not fine-tuned).
2. **ControlNet models** — OSM, Canny, and SAM conditioning, located under [`controlnet/`](controlnet/).
### Architecture note: location-conditioned models
Location-conditioned variants (GeoSynth-Location-*) use a **different base checkpoint** that adds a CoordNet branch. The branch takes `[lon, lat]` as input, passes it through a **SatCLIP** location encoder, then through a **CoordNet** (13 stacked cross-attention blocks, inner dim 256, 4 heads). ControlNet and CoordNet both condition the UNet. See the [GeoSynth paper](https://huggingface.co/papers/2404.06637) Figure 3.
### ControlNet variants (this repo)
| Control | Subfolder | Status |
|---------|-----------|--------|
| OSM | `controlnet/GeoSynth-OSM` | ✅ Integrated |
| Canny | `controlnet/GeoSynth-Canny` | ✅ Integrated |
| SAM | `controlnet/GeoSynth-SAM` | ✅ Integrated |
Use it with 🧨 [diffusers](#examples) or the [Stable Diffusion](https://github.com/Stability-AI/stablediffusion) repository.
### Model Sources
- **Source:** [GeoSynth](https://github.com/mvrl/GeoSynth)
- **Paper:** [GeoSynth: Contextually-Aware High-Resolution Satellite Image Synthesis](https://huggingface.co/papers/2404.06637)
- **Base model:** [Stable Diffusion 2.1](https://huggingface.co/sd2-community/stable-diffusion-2-1-base)
## Examples
### Text-to-Image (base GeoSynth)
```python
from diffusers import StableDiffusionPipeline
pipe = StableDiffusionPipeline.from_pretrained("BiliSakura/GeoSynth-ControlNets")
pipe = pipe.to("cuda")
image = pipe("Satellite image features a city neighborhood").images[0]
image.save("generated_city.jpg")
```
### ControlNet (diffusers integration)
Use the 🧨 diffusers `ControlNetModel` wrapper with `StableDiffusionControlNetPipeline`:
**GeoSynth-OSM** — synthesizes satellite images from OpenStreetMap tiles (RGB):
```python
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
from PIL import Image
import torch
controlnet = ControlNetModel.from_pretrained(
"BiliSakura/GeoSynth-ControlNets",
subfolder="controlnet/GeoSynth-OSM",
)
pipe = StableDiffusionControlNetPipeline.from_pretrained(
"BiliSakura/GeoSynth-ControlNets",
controlnet=controlnet,
)
pipe = pipe.to("cuda")
img = Image.open("osm_tile.jpeg") # OSM tile (RGB, 512x512)
generator = torch.manual_seed(42)
image = pipe("Satellite image features a city neighborhood", image=img, generator=generator, num_inference_steps=20).images[0]
image.save("generated_city.jpg")
```
**GeoSynth-Canny** — synthesizes satellite images from Canny edge maps:
```python
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
from PIL import Image
import torch
controlnet = ControlNetModel.from_pretrained(
"BiliSakura/GeoSynth-ControlNets",
subfolder="controlnet/GeoSynth-Canny",
)
pipe = StableDiffusionControlNetPipeline.from_pretrained(
"BiliSakura/GeoSynth-ControlNets",
controlnet=controlnet,
)
pipe = pipe.to("cuda")
img = Image.open("canny_edges.jpeg") # Canny edge image (RGB, 512x512)
generator = torch.manual_seed(42)
image = pipe("Satellite image features a city neighborhood", image=img, generator=generator, num_inference_steps=20).images[0]
image.save("generated_city.jpg")
```
**GeoSynth-SAM** — synthesizes satellite images from SAM (Segment Anything Model) segmentation masks:
```python
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
from PIL import Image
import torch
controlnet = ControlNetModel.from_pretrained(
"BiliSakura/GeoSynth-ControlNets",
subfolder="controlnet/GeoSynth-SAM",
)
pipe = StableDiffusionControlNetPipeline.from_pretrained(
"BiliSakura/GeoSynth-ControlNets",
controlnet=controlnet,
)
pipe = pipe.to("cuda")
img = Image.open("sam_segmentation.jpeg") # SAM mask (RGB, 512x512)
generator = torch.manual_seed(42)
image = pipe("Satellite image features a city neighborhood", image=img, generator=generator, num_inference_steps=20).images[0]
image.save("generated_city.jpg")
```
*For location-conditioned variants (GeoSynth-Location-OSM, GeoSynth-Location-SAM, GeoSynth-Location-Canny), see the separate [GeoSynth-ControlNets-Location](https://huggingface.co/BiliSakura/GeoSynth-ControlNets-Location) repo.*
## Citation
If you use this model, please cite the GeoSynth paper. For location-conditioned variants, also cite SatCLIP.
```bibtex
@inproceedings{sastry2024geosynth,
title={GeoSynth: Contextually-Aware High-Resolution Satellite Image Synthesis},
author={Sastry, Srikumar and Khanal, Subash and Dhakal, Aayush and Jacobs, Nathan},
booktitle={IEEE/ISPRS Workshop: Large Scale Computer Vision for Remote Sensing (EARTHVISION)},
year={2024}
}
@article{klemmer2025satclip,
title={{SatCLIP}: {Global}, General-Purpose Location Embeddings with Satellite Imagery},
author={Klemmer, Konstantin and Rolf, Esther and Robinson, Caleb and Mackey, Lester and Ru{\ss}wurm, Marc},
journal={Proceedings of the AAAI Conference on Artificial Intelligence},
volume={39},
number={4},
pages={4347--4355},
year={2025},
doi={10.1609/aaai.v39i4.32457}
}
``` |