| --- |
| license: apache-2.0 |
| library_name: diffusers |
| pipeline_tag: image-to-image |
| tags: |
| - controlnet |
| - remote-sensing |
| - arxiv:2404.06637 |
| widget: |
| |
| - src: demo_images/GeoSynth-OSM/input.jpeg |
| prompt: Satellite image features a city neighborhood |
| output: |
| url: demo_images/GeoSynth-OSM/output.jpeg |
| |
| - src: demo_images/GeoSynth-Canny/input.jpeg |
| prompt: Satellite image features a city neighborhood |
| output: |
| url: demo_images/GeoSynth-Canny/output.jpeg |
| |
| - src: demo_images/GeoSynth-SAM/input.jpeg |
| prompt: Satellite image features a city neighborhood |
| output: |
| url: demo_images/GeoSynth-SAM/output.jpeg |
| --- |
| |
| > [!WARNING] we do not have a full checkpoint conversion validation, if you encounter pipeline loading failure and unsidered output, please contact me via bili_sakura@zju.edu.cn |
| |
| # GeoSynth-ControlNets |
| |
| We maintain **two repositories**—one per base checkpoint—each with its compatible ControlNets: |
| |
| | Repo | Base Model | ControlNets | |
| |------|------------|-------------| |
| | **This repo** | GeoSynth (text encoder & UNet same as SD 2.1) | GeoSynth-OSM, GeoSynth-Canny, GeoSynth-SAM | |
| | **[GeoSynth-ControlNets-Location](https://huggingface.co/BiliSakura/GeoSynth-ControlNets-Location)** | GeoSynth-Location (adds CoordNet branch) | GeoSynth-Location-OSM, GeoSynth-Location-SAM*, GeoSynth-Location-Canny | |
| |
| *[GeoSynth-Location-SAM](https://huggingface.co/MVRL/GeoSynth-Location-SAM) controlnet ckpt is missing from source.* |
| |
| ### This repository |
| |
| 1. **GeoSynth checkpoint** — A remote sensing visual generative model. The text encoder and UNet are the same as [Stable Diffusion 2.1](https://huggingface.co/sd2-community/stable-diffusion-2-1-base) (not fine-tuned). |
| 2. **ControlNet models** — OSM, Canny, and SAM conditioning, located under [`controlnet/`](controlnet/). |
| |
| ### Architecture note: location-conditioned models |
| |
| Location-conditioned variants (GeoSynth-Location-*) use a **different base checkpoint** that adds a CoordNet branch. The branch takes `[lon, lat]` as input, passes it through a **SatCLIP** location encoder, then through a **CoordNet** (13 stacked cross-attention blocks, inner dim 256, 4 heads). ControlNet and CoordNet both condition the UNet. See the [GeoSynth paper](https://huggingface.co/papers/2404.06637) Figure 3. |
| |
| ### ControlNet variants (this repo) |
| |
| | Control | Subfolder | Status | |
| |---------|-----------|--------| |
| | OSM | `controlnet/GeoSynth-OSM` | ✅ Integrated | |
| | Canny | `controlnet/GeoSynth-Canny` | ✅ Integrated | |
| | SAM | `controlnet/GeoSynth-SAM` | ✅ Integrated | |
| |
| Use it with 🧨 [diffusers](#examples) or the [Stable Diffusion](https://github.com/Stability-AI/stablediffusion) repository. |
| |
| ### Model Sources |
| |
| - **Source:** [GeoSynth](https://github.com/mvrl/GeoSynth) |
| - **Paper:** [GeoSynth: Contextually-Aware High-Resolution Satellite Image Synthesis](https://huggingface.co/papers/2404.06637) |
| - **Base model:** [Stable Diffusion 2.1](https://huggingface.co/sd2-community/stable-diffusion-2-1-base) |
| |
| |
| |
| ## Examples |
| |
| ### Text-to-Image (base GeoSynth) |
| |
| ```python |
| from diffusers import StableDiffusionPipeline |
| |
| pipe = StableDiffusionPipeline.from_pretrained("BiliSakura/GeoSynth-ControlNets") |
| pipe = pipe.to("cuda") |
|
|
| image = pipe("Satellite image features a city neighborhood").images[0] |
| image.save("generated_city.jpg") |
| ``` |
| |
| ### ControlNet (diffusers integration) |
| |
| Use the 🧨 diffusers `ControlNetModel` wrapper with `StableDiffusionControlNetPipeline`: |
| |
| **GeoSynth-OSM** — synthesizes satellite images from OpenStreetMap tiles (RGB): |
| |
| ```python |
| from diffusers import StableDiffusionControlNetPipeline, ControlNetModel |
| from PIL import Image |
| import torch |
| |
| controlnet = ControlNetModel.from_pretrained( |
| "BiliSakura/GeoSynth-ControlNets", |
| subfolder="controlnet/GeoSynth-OSM", |
| ) |
| pipe = StableDiffusionControlNetPipeline.from_pretrained( |
| "BiliSakura/GeoSynth-ControlNets", |
| controlnet=controlnet, |
| ) |
| pipe = pipe.to("cuda") |
| img = Image.open("osm_tile.jpeg") # OSM tile (RGB, 512x512) |
| generator = torch.manual_seed(42) |
| image = pipe("Satellite image features a city neighborhood", image=img, generator=generator, num_inference_steps=20).images[0] |
| image.save("generated_city.jpg") |
| ``` |
| |
| **GeoSynth-Canny** — synthesizes satellite images from Canny edge maps: |
|
|
| ```python |
| from diffusers import StableDiffusionControlNetPipeline, ControlNetModel |
| from PIL import Image |
| import torch |
| |
| controlnet = ControlNetModel.from_pretrained( |
| "BiliSakura/GeoSynth-ControlNets", |
| subfolder="controlnet/GeoSynth-Canny", |
| ) |
| pipe = StableDiffusionControlNetPipeline.from_pretrained( |
| "BiliSakura/GeoSynth-ControlNets", |
| controlnet=controlnet, |
| ) |
| pipe = pipe.to("cuda") |
| img = Image.open("canny_edges.jpeg") # Canny edge image (RGB, 512x512) |
| generator = torch.manual_seed(42) |
| image = pipe("Satellite image features a city neighborhood", image=img, generator=generator, num_inference_steps=20).images[0] |
| image.save("generated_city.jpg") |
| ``` |
|
|
| **GeoSynth-SAM** — synthesizes satellite images from SAM (Segment Anything Model) segmentation masks: |
|
|
| ```python |
| from diffusers import StableDiffusionControlNetPipeline, ControlNetModel |
| from PIL import Image |
| import torch |
| |
| controlnet = ControlNetModel.from_pretrained( |
| "BiliSakura/GeoSynth-ControlNets", |
| subfolder="controlnet/GeoSynth-SAM", |
| ) |
| pipe = StableDiffusionControlNetPipeline.from_pretrained( |
| "BiliSakura/GeoSynth-ControlNets", |
| controlnet=controlnet, |
| ) |
| pipe = pipe.to("cuda") |
| img = Image.open("sam_segmentation.jpeg") # SAM mask (RGB, 512x512) |
| generator = torch.manual_seed(42) |
| image = pipe("Satellite image features a city neighborhood", image=img, generator=generator, num_inference_steps=20).images[0] |
| image.save("generated_city.jpg") |
| ``` |
|
|
| *For location-conditioned variants (GeoSynth-Location-OSM, GeoSynth-Location-SAM, GeoSynth-Location-Canny), see the separate [GeoSynth-ControlNets-Location](https://huggingface.co/BiliSakura/GeoSynth-ControlNets-Location) repo.* |
|
|
| ## Citation |
|
|
| If you use this model, please cite the GeoSynth paper. For location-conditioned variants, also cite SatCLIP. |
|
|
| ```bibtex |
| @inproceedings{sastry2024geosynth, |
| title={GeoSynth: Contextually-Aware High-Resolution Satellite Image Synthesis}, |
| author={Sastry, Srikumar and Khanal, Subash and Dhakal, Aayush and Jacobs, Nathan}, |
| booktitle={IEEE/ISPRS Workshop: Large Scale Computer Vision for Remote Sensing (EARTHVISION)}, |
| year={2024} |
| } |
| |
| @article{klemmer2025satclip, |
| title={{SatCLIP}: {Global}, General-Purpose Location Embeddings with Satellite Imagery}, |
| author={Klemmer, Konstantin and Rolf, Esther and Robinson, Caleb and Mackey, Lester and Ru{\ss}wurm, Marc}, |
| journal={Proceedings of the AAAI Conference on Artificial Intelligence}, |
| volume={39}, |
| number={4}, |
| pages={4347--4355}, |
| year={2025}, |
| doi={10.1609/aaai.v39i4.32457} |
| } |
| ``` |