File size: 5,055 Bytes
aa2ba63
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c161111
 
aa2ba63
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
---
license: apache-2.0
library_name: diffusers
pipeline_tag: image-to-image
tags:
- controlnet
- remote-sensing
- arxiv:2404.06637
widget:
# GeoSynth-Location-OSM: OSM tile -> satellite image (default lon=-90.2, lat=38.6)
- src: demo_images/GeoSynth-Location-OSM/input.jpeg
  prompt: Satellite image features a city neighborhood
  output:
    url: demo_images/GeoSynth-Location-OSM/output.jpeg
# GeoSynth-Location-Canny: Canny edges -> satellite image (default lon=-90.2, lat=38.6)
- src: demo_images/GeoSynth-Location-Canny/input.jpeg
  prompt: Satellite image features a city neighborhood
  output:
    url: demo_images/GeoSynth-Location-Canny/output.jpeg
---

> [!WARNING] we do not have a full checkpoint conversion validation, if you encounter pipeline loading failure and unsidered output, please contact me via bili_sakura@zju.edu.cn

# GeoSynth-ControlNets-Location

Repository for location-conditioned GeoSynth ControlNets.

**Location (lon/lat) conditioning** is the primary workflow for geo-aware synthesis. Default: St. Louis, MO (`lon=-90.2`, `lat=38.6`). Use SatCLIP + CoordNet for full diffusers-style location conditioning.

We maintain **two repositories**—one per base checkpoint—each with its compatible ControlNets:

| Repo | Base Model | ControlNets |
|------|------------|-------------|
| **[GeoSynth-ControlNets](https://huggingface.co/BiliSakura/GeoSynth-ControlNets)** | GeoSynth (text encoder & UNet same as SD 2.1) | GeoSynth-OSM, GeoSynth-Canny, GeoSynth-SAM |
| **This repo** | GeoSynth-Location (adds CoordNet branch) | GeoSynth-Location-OSM, GeoSynth-Location-SAM*, GeoSynth-Location-Canny |

*[GeoSynth-Location-SAM](https://huggingface.co/MVRL/GeoSynth-Location-SAM) controlnet ckpt is missing from source.*

### This repository

1. **GeoSynth-Location base** — Converted from `geosynth_sd_loc-v3.ckpt` to diffusers format. Text encoder and UNet are the same as SD 2.1 (not fine-tuned). The original checkpoint also includes a CoordNet branch for `[lon, lat]` conditioning (see Architecture).
2. **ControlNet models** — GeoSynth-Location-OSM, GeoSynth-Location-Canny (converted from SD-style checkpoints under `MVRL/GeoSynth-Location-OSM` and `MVRL/GeoSynth-Location-Canny`), and GeoSynth-Location-SAM, located under [`controlnet/`](controlnet/).

### Architecture

The full location pipeline adds a **CoordNet** branch to the base LDM:

- **Input**: `[lon, lat]` → **SatCLIP** location encoder → **CoordNet** (13 stacked cross-attention blocks, inner dim 256, 4 heads) → conditioning injected into UNet
- ControlNet and CoordNet jointly condition the UNet (see [GeoSynth paper](https://huggingface.co/papers/2404.06637) Figure 3)

### ControlNet variants (this repo)

| Control | Subfolder | Status |
|---------|-----------|--------|
| OSM     | `controlnet/GeoSynth-Location-OSM` | ✅ ready |
| Canny   | `controlnet/GeoSynth-Location-Canny` | ✅ ready |
| SAM     | `controlnet/GeoSynth-Location-SAM` | ⏳ ckpt pending |

### Model Sources

- **Source:** [GeoSynth](https://github.com/mvrl/GeoSynth)
- **Paper:** [GeoSynth: Contextually-Aware High-Resolution Satellite Image Synthesis](https://huggingface.co/papers/2404.06637)
- **Base model:** [Stable Diffusion 2.1](https://huggingface.co/sd2-community/stable-diffusion-2-1-base)
- **Related:** [GeoSynth-ControlNets](https://huggingface.co/BiliSakura/GeoSynth-ControlNets) (non-location models)

## Usage

**CLI:**
```bash
python inference_demo.py --control demo_images/GeoSynth-Location-OSM/input.jpeg --control_type OSM --lon -90.2 --lat 38.6
```

**Python:**
```python
import sys, os
sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))

import torch
from PIL import Image
from geosynth_pipeline import load_geosynth_pipeline_with_location, run_with_location

pipe = load_geosynth_pipeline_with_location(".", controlnet_subfolder="controlnet/GeoSynth-Location-OSM", local_files_only=True)
pipe = pipe.to("cuda")

img = Image.open("demo_images/GeoSynth-Location-OSM/input.jpeg").convert("RGB").resize((512, 512))
output = run_with_location(pipe, "Satellite image features a city neighborhood", image=img, lon=-90.2, lat=38.6)
output.images[0].save("generated_city.jpg")
```

## Citation

If you use this model, please cite the GeoSynth paper. For location-conditioned variants, also cite SatCLIP.

```bibtex
@inproceedings{sastry2024geosynth,
  title={GeoSynth: Contextually-Aware High-Resolution Satellite Image Synthesis},
  author={Sastry, Srikumar and Khanal, Subash and Dhakal, Aayush and Jacobs, Nathan},
  booktitle={IEEE/ISPRS Workshop: Large Scale Computer Vision for Remote Sensing (EARTHVISION)},
  year={2024}
}

@article{klemmer2025satclip,
  title={{SatCLIP}: {Global}, General-Purpose Location Embeddings with Satellite Imagery},
  author={Klemmer, Konstantin and Rolf, Esther and Robinson, Caleb and Mackey, Lester and Ru{\ss}wurm, Marc},
  journal={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={39},
  number={4},
  pages={4347--4355},
  year={2025},
  doi={10.1609/aaai.v39i4.32457}
}
```