metadata
license: cc-by-4.0
datasets:
- DominikM198/PP2-M
base_model:
- openai/clip-vit-large-patch14
- BAAI/bge-small-en-v1.5
- torchgeo/vit_small_patch16_224_sentinel2_all_moco
- DominikM198/OSM-MAE
pipeline_tag: feature-extraction
tags:
- SpatialRepresentationLearning
- GeoFoundationModel
- GeoFM
- ContrastiveLearning
- Mutlimodal
UrbanFusion: Stochastic Multimodal Fusion for Contrastive Learning of Robust Spatial Representations
This repository provides the pretrained weights of the UrbanFusion model — a framework for learning robust spatial representations through stochastic multimodal fusion.
UrbanFusion can generate location encodings from any subset of the following modalities:
- 📍 Geographic coordinates
- 🏙️ Street-view imagery
- 🛰️ Remote sensing data
- 🗺️ OSM basemaps
- 🏬 Points of interest (POIs)
🔗 The full source code is available on GitHub, and further details are described in our paper.
📖 Citation
@article{muehlematter2025urbanfusion,
title = {UrbanFusion: Stochastic Multimodal Fusion for Contrastive Learning of Robust Spatial Representations},
author = {Dominik J. Mühlematter and Lin Che and Ye Hong and Martin Raubal and Nina Wiedemann},
year = {2025},
journal = {arXiv preprint arXiv:2510.13774}
}