--- license: cc-by-4.0 datasets: - DominikM198/PP2-M base_model: - openai/clip-vit-large-patch14 - BAAI/bge-small-en-v1.5 - torchgeo/vit_small_patch16_224_sentinel2_all_moco - DominikM198/OSM-MAE pipeline_tag: feature-extraction tags: - SpatialRepresentationLearning - GeoFoundationModel - GeoFM - ContrastiveLearning - Mutlimodal --- # UrbanFusion: Stochastic Multimodal Fusion for Contrastive Learning of Robust Spatial Representations This repository provides the **pretrained weights** of the **UrbanFusion** model โ€” a framework for learning robust spatial representations through stochastic multimodal fusion. UrbanFusion can generate **location encodings** from *any subset* of the following modalities: - ๐Ÿ“ Geographic coordinates - ๐Ÿ™๏ธ Street-view imagery - ๐Ÿ›ฐ๏ธ Remote sensing data - ๐Ÿ—บ๏ธ OSM basemaps - ๐Ÿฌ Points of interest (POIs) ๐Ÿ”— The full **source code** is available on [GitHub](https://github.com/DominikM198/UrbanFusion), and further details are described in our paper. --- ## ๐Ÿ“– Citation ```bibtex @article{muehlematter2025urbanfusion, title = {UrbanFusion: Stochastic Multimodal Fusion for Contrastive Learning of Robust Spatial Representations}, author = {Dominik J. Mรผhlematter and Lin Che and Ye Hong and Martin Raubal and Nina Wiedemann}, year = {2025}, journal = {arXiv preprint arXiv:2510.13774} } ``` ---