---
license: apache-2.0
library_name: diffusers
pipeline_tag: image-to-image
tags:
- controlnet
- remote-sensing
- arxiv:2404.06637
widget:
# GeoSynth-OSM: OSM tile -> satellite image
- src: demo_images/GeoSynth-OSM/input.jpeg
  prompt: Satellite image features a city neighborhood
  output:
    url: demo_images/GeoSynth-OSM/output.jpeg
# GeoSynth-Canny: Canny edges -> satellite image
- src: demo_images/GeoSynth-Canny/input.jpeg
  prompt: Satellite image features a city neighborhood
  output:
    url: demo_images/GeoSynth-Canny/output.jpeg
# GeoSynth-SAM: SAM segmentation -> satellite image
- src: demo_images/GeoSynth-SAM/input.jpeg
  prompt: Satellite image features a city neighborhood
  output:
    url: demo_images/GeoSynth-SAM/output.jpeg
---

> [!WARNING] we do not have a full checkpoint conversion validation, if you encounter pipeline loading failure and unsidered output, please contact me via bili_sakura@zju.edu.cn

# GeoSynth-ControlNets

We maintain **two repositories**—one per base checkpoint—each with its compatible ControlNets:

| Repo | Base Model | ControlNets |
|------|------------|-------------|
| **This repo** | GeoSynth (text encoder & UNet same as SD 2.1) | GeoSynth-OSM, GeoSynth-Canny, GeoSynth-SAM |
| **[GeoSynth-ControlNets-Location](https://huggingface.co/BiliSakura/GeoSynth-ControlNets-Location)** | GeoSynth-Location (adds CoordNet branch) | GeoSynth-Location-OSM, GeoSynth-Location-SAM*, GeoSynth-Location-Canny |

*[GeoSynth-Location-SAM](https://huggingface.co/MVRL/GeoSynth-Location-SAM) controlnet ckpt is missing from source.*

### This repository

1. **GeoSynth checkpoint** — A remote sensing visual generative model. The text encoder and UNet are the same as [Stable Diffusion 2.1](https://huggingface.co/sd2-community/stable-diffusion-2-1-base) (not fine-tuned).
2. **ControlNet models** — OSM, Canny, and SAM conditioning, located under [`controlnet/`](controlnet/).

### Architecture note: location-conditioned models

Location-conditioned variants (GeoSynth-Location-*) use a **different base checkpoint** that adds a CoordNet branch. The branch takes `[lon, lat]` as input, passes it through a **SatCLIP** location encoder, then through a **CoordNet** (13 stacked cross-attention blocks, inner dim 256, 4 heads). ControlNet and CoordNet both condition the UNet. See the [GeoSynth paper](https://huggingface.co/papers/2404.06637) Figure 3.

### ControlNet variants (this repo)

| Control | Subfolder | Status |
|---------|-----------|--------|
| OSM     | `controlnet/GeoSynth-OSM` | ✅ Integrated |
| Canny   | `controlnet/GeoSynth-Canny` | ✅ Integrated |
| SAM     | `controlnet/GeoSynth-SAM` | ✅ Integrated |

Use it with 🧨 [diffusers](#examples) or the [Stable Diffusion](https://github.com/Stability-AI/stablediffusion) repository.

### Model Sources

- **Source:** [GeoSynth](https://github.com/mvrl/GeoSynth)
- **Paper:** [GeoSynth: Contextually-Aware High-Resolution Satellite Image Synthesis](https://huggingface.co/papers/2404.06637)
- **Base model:** [Stable Diffusion 2.1](https://huggingface.co/sd2-community/stable-diffusion-2-1-base)


## Examples

### Text-to-Image (base GeoSynth)

```python
from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_pretrained("BiliSakura/GeoSynth-ControlNets")
pipe = pipe.to("cuda")

image = pipe("Satellite image features a city neighborhood").images[0]
image.save("generated_city.jpg")
```

### ControlNet (diffusers integration)

Use the 🧨 diffusers `ControlNetModel` wrapper with `StableDiffusionControlNetPipeline`:

**GeoSynth-OSM** — synthesizes satellite images from OpenStreetMap tiles (RGB):

```python
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
from PIL import Image
import torch

controlnet = ControlNetModel.from_pretrained(
    "BiliSakura/GeoSynth-ControlNets",
    subfolder="controlnet/GeoSynth-OSM",
)
pipe = StableDiffusionControlNetPipeline.from_pretrained(
    "BiliSakura/GeoSynth-ControlNets",
    controlnet=controlnet,
)
pipe = pipe.to("cuda")
img = Image.open("osm_tile.jpeg")  # OSM tile (RGB, 512x512)
generator = torch.manual_seed(42)
image = pipe("Satellite image features a city neighborhood", image=img, generator=generator, num_inference_steps=20).images[0]
image.save("generated_city.jpg")
```

**GeoSynth-Canny** — synthesizes satellite images from Canny edge maps:

```python
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
from PIL import Image
import torch

controlnet = ControlNetModel.from_pretrained(
    "BiliSakura/GeoSynth-ControlNets",
    subfolder="controlnet/GeoSynth-Canny",
)
pipe = StableDiffusionControlNetPipeline.from_pretrained(
    "BiliSakura/GeoSynth-ControlNets",
    controlnet=controlnet,
)
pipe = pipe.to("cuda")
img = Image.open("canny_edges.jpeg")  # Canny edge image (RGB, 512x512)
generator = torch.manual_seed(42)
image = pipe("Satellite image features a city neighborhood", image=img, generator=generator, num_inference_steps=20).images[0]
image.save("generated_city.jpg")
```

**GeoSynth-SAM** — synthesizes satellite images from SAM (Segment Anything Model) segmentation masks:

```python
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
from PIL import Image
import torch

controlnet = ControlNetModel.from_pretrained(
    "BiliSakura/GeoSynth-ControlNets",
    subfolder="controlnet/GeoSynth-SAM",
)
pipe = StableDiffusionControlNetPipeline.from_pretrained(
    "BiliSakura/GeoSynth-ControlNets",
    controlnet=controlnet,
)
pipe = pipe.to("cuda")
img = Image.open("sam_segmentation.jpeg")  # SAM mask (RGB, 512x512)
generator = torch.manual_seed(42)
image = pipe("Satellite image features a city neighborhood", image=img, generator=generator, num_inference_steps=20).images[0]
image.save("generated_city.jpg")
```

*For location-conditioned variants (GeoSynth-Location-OSM, GeoSynth-Location-SAM, GeoSynth-Location-Canny), see the separate [GeoSynth-ControlNets-Location](https://huggingface.co/BiliSakura/GeoSynth-ControlNets-Location) repo.*

## Citation

If you use this model, please cite the GeoSynth paper. For location-conditioned variants, also cite SatCLIP.

```bibtex
@inproceedings{sastry2024geosynth,
  title={GeoSynth: Contextually-Aware High-Resolution Satellite Image Synthesis},
  author={Sastry, Srikumar and Khanal, Subash and Dhakal, Aayush and Jacobs, Nathan},
  booktitle={IEEE/ISPRS Workshop: Large Scale Computer Vision for Remote Sensing (EARTHVISION)},
  year={2024}
}

@article{klemmer2025satclip,
  title={{SatCLIP}: {Global}, General-Purpose Location Embeddings with Satellite Imagery},
  author={Klemmer, Konstantin and Rolf, Esther and Robinson, Caleb and Mackey, Lester and Ru{\ss}wurm, Marc},
  journal={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={39},
  number={4},
  pages={4347--4355},
  year={2025},
  doi={10.1609/aaai.v39i4.32457}
}
```