File size: 6,918 Bytes
2e8a26c
 
 
 
 
 
 
 
 
09a04e6
2e8a26c
 
 
 
09a04e6
2e8a26c
 
 
 
09a04e6
2e8a26c
 
 
 
 
 
09a04e6
 
2e8a26c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
---
license: apache-2.0
library_name: diffusers
pipeline_tag: image-to-image
tags:
- controlnet
- remote-sensing
- arxiv:2404.06637
widget:
# GeoSynth-OSM: OSM tile -> satellite image
- src: demo_images/GeoSynth-OSM/input.jpeg
  prompt: Satellite image features a city neighborhood
  output:
    url: demo_images/GeoSynth-OSM/output.jpeg
# GeoSynth-Canny: Canny edges -> satellite image
- src: demo_images/GeoSynth-Canny/input.jpeg
  prompt: Satellite image features a city neighborhood
  output:
    url: demo_images/GeoSynth-Canny/output.jpeg
# GeoSynth-SAM: SAM segmentation -> satellite image
- src: demo_images/GeoSynth-SAM/input.jpeg
  prompt: Satellite image features a city neighborhood
  output:
    url: demo_images/GeoSynth-SAM/output.jpeg
---

> [!WARNING] we do not have a full checkpoint conversion validation, if you encounter pipeline loading failure and unsidered output, please contact me via bili_sakura@zju.edu.cn

# GeoSynth-ControlNets

We maintain **two repositories**—one per base checkpoint—each with its compatible ControlNets:

| Repo | Base Model | ControlNets |
|------|------------|-------------|
| **This repo** | GeoSynth (text encoder & UNet same as SD 2.1) | GeoSynth-OSM, GeoSynth-Canny, GeoSynth-SAM |
| **[GeoSynth-ControlNets-Location](https://huggingface.co/BiliSakura/GeoSynth-ControlNets-Location)** | GeoSynth-Location (adds CoordNet branch) | GeoSynth-Location-OSM, GeoSynth-Location-SAM*, GeoSynth-Location-Canny |

*[GeoSynth-Location-SAM](https://huggingface.co/MVRL/GeoSynth-Location-SAM) controlnet ckpt is missing from source.*

### This repository

1. **GeoSynth checkpoint** — A remote sensing visual generative model. The text encoder and UNet are the same as [Stable Diffusion 2.1](https://huggingface.co/sd2-community/stable-diffusion-2-1-base) (not fine-tuned).
2. **ControlNet models** — OSM, Canny, and SAM conditioning, located under [`controlnet/`](controlnet/).

### Architecture note: location-conditioned models

Location-conditioned variants (GeoSynth-Location-*) use a **different base checkpoint** that adds a CoordNet branch. The branch takes `[lon, lat]` as input, passes it through a **SatCLIP** location encoder, then through a **CoordNet** (13 stacked cross-attention blocks, inner dim 256, 4 heads). ControlNet and CoordNet both condition the UNet. See the [GeoSynth paper](https://huggingface.co/papers/2404.06637) Figure 3.

### ControlNet variants (this repo)

| Control | Subfolder | Status |
|---------|-----------|--------|
| OSM     | `controlnet/GeoSynth-OSM` | ✅ Integrated |
| Canny   | `controlnet/GeoSynth-Canny` | ✅ Integrated |
| SAM     | `controlnet/GeoSynth-SAM` | ✅ Integrated |

Use it with 🧨 [diffusers](#examples) or the [Stable Diffusion](https://github.com/Stability-AI/stablediffusion) repository.

### Model Sources

- **Source:** [GeoSynth](https://github.com/mvrl/GeoSynth)
- **Paper:** [GeoSynth: Contextually-Aware High-Resolution Satellite Image Synthesis](https://huggingface.co/papers/2404.06637)
- **Base model:** [Stable Diffusion 2.1](https://huggingface.co/sd2-community/stable-diffusion-2-1-base)



## Examples

### Text-to-Image (base GeoSynth)

```python
from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_pretrained("BiliSakura/GeoSynth-ControlNets")
pipe = pipe.to("cuda")

image = pipe("Satellite image features a city neighborhood").images[0]
image.save("generated_city.jpg")
```

### ControlNet (diffusers integration)

Use the 🧨 diffusers `ControlNetModel` wrapper with `StableDiffusionControlNetPipeline`:

**GeoSynth-OSM** — synthesizes satellite images from OpenStreetMap tiles (RGB):

```python
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
from PIL import Image
import torch

controlnet = ControlNetModel.from_pretrained(
    "BiliSakura/GeoSynth-ControlNets",
    subfolder="controlnet/GeoSynth-OSM",
)
pipe = StableDiffusionControlNetPipeline.from_pretrained(
    "BiliSakura/GeoSynth-ControlNets",
    controlnet=controlnet,
)
pipe = pipe.to("cuda")
img = Image.open("osm_tile.jpeg")  # OSM tile (RGB, 512x512)
generator = torch.manual_seed(42)
image = pipe("Satellite image features a city neighborhood", image=img, generator=generator, num_inference_steps=20).images[0]
image.save("generated_city.jpg")
```

**GeoSynth-Canny** — synthesizes satellite images from Canny edge maps:

```python
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
from PIL import Image
import torch

controlnet = ControlNetModel.from_pretrained(
    "BiliSakura/GeoSynth-ControlNets",
    subfolder="controlnet/GeoSynth-Canny",
)
pipe = StableDiffusionControlNetPipeline.from_pretrained(
    "BiliSakura/GeoSynth-ControlNets",
    controlnet=controlnet,
)
pipe = pipe.to("cuda")
img = Image.open("canny_edges.jpeg")  # Canny edge image (RGB, 512x512)
generator = torch.manual_seed(42)
image = pipe("Satellite image features a city neighborhood", image=img, generator=generator, num_inference_steps=20).images[0]
image.save("generated_city.jpg")
```

**GeoSynth-SAM** — synthesizes satellite images from SAM (Segment Anything Model) segmentation masks:

```python
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
from PIL import Image
import torch

controlnet = ControlNetModel.from_pretrained(
    "BiliSakura/GeoSynth-ControlNets",
    subfolder="controlnet/GeoSynth-SAM",
)
pipe = StableDiffusionControlNetPipeline.from_pretrained(
    "BiliSakura/GeoSynth-ControlNets",
    controlnet=controlnet,
)
pipe = pipe.to("cuda")
img = Image.open("sam_segmentation.jpeg")  # SAM mask (RGB, 512x512)
generator = torch.manual_seed(42)
image = pipe("Satellite image features a city neighborhood", image=img, generator=generator, num_inference_steps=20).images[0]
image.save("generated_city.jpg")
```

*For location-conditioned variants (GeoSynth-Location-OSM, GeoSynth-Location-SAM, GeoSynth-Location-Canny), see the separate [GeoSynth-ControlNets-Location](https://huggingface.co/BiliSakura/GeoSynth-ControlNets-Location) repo.*

## Citation

If you use this model, please cite the GeoSynth paper. For location-conditioned variants, also cite SatCLIP.

```bibtex
@inproceedings{sastry2024geosynth,
  title={GeoSynth: Contextually-Aware High-Resolution Satellite Image Synthesis},
  author={Sastry, Srikumar and Khanal, Subash and Dhakal, Aayush and Jacobs, Nathan},
  booktitle={IEEE/ISPRS Workshop: Large Scale Computer Vision for Remote Sensing (EARTHVISION)},
  year={2024}
}

@article{klemmer2025satclip,
  title={{SatCLIP}: {Global}, General-Purpose Location Embeddings with Satellite Imagery},
  author={Klemmer, Konstantin and Rolf, Esther and Robinson, Caleb and Mackey, Lester and Ru{\ss}wurm, Marc},
  journal={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={39},
  number={4},
  pages={4347--4355},
  year={2025},
  doi={10.1609/aaai.v39i4.32457}
}
```