Improve model card: pipeline tag, paper link, and usage example
Browse filesThis PR enhances the model card for UrbanFusion by:
- Updating the `pipeline_tag` from `any-to-any` to `feature-extraction`. This more accurately reflects the model's function of generating spatial representations and improves its discoverability on the Hub.
- Adding an explicit link to the paper, [UrbanFusion: Stochastic Multimodal Fusion for Contrastive Learning of Robust Spatial Representations](https://huggingface.co/papers/2510.13774), in the introductory description.
- Including a "Minimal Usage Example" section with a Python code snippet, directly sourced from the GitHub repository, to help users quickly get started with generating location embeddings.
- Updating the BibTeX citation to include the correct arXiv ID and URL for the paper.
|
@@ -1,13 +1,13 @@
|
|
| 1 |
---
|
| 2 |
-
license: cc-by-4.0
|
| 3 |
-
datasets:
|
| 4 |
-
- DominikM198/PP2-M
|
| 5 |
base_model:
|
| 6 |
- openai/clip-vit-large-patch14
|
| 7 |
- BAAI/bge-small-en-v1.5
|
| 8 |
- torchgeo/vit_small_patch16_224_sentinel2_all_moco
|
| 9 |
- DominikM198/OSM-MAE
|
| 10 |
-
|
|
|
|
|
|
|
|
|
|
| 11 |
tags:
|
| 12 |
- SpatialRepresentationLearning
|
| 13 |
- GeoFoundationModel
|
|
@@ -15,9 +15,10 @@ tags:
|
|
| 15 |
- ContrastiveLearning
|
| 16 |
- Mutlimodal
|
| 17 |
---
|
|
|
|
| 18 |
# UrbanFusion: Stochastic Multimodal Fusion for Contrastive Learning of Robust Spatial Representations
|
| 19 |
|
| 20 |
-
This repository provides the **pretrained weights** of the **UrbanFusion** model — a framework for learning robust spatial representations through stochastic multimodal fusion.
|
| 21 |
|
| 22 |
UrbanFusion can generate **location encodings** from *any subset* of the following modalities:
|
| 23 |
- 📍 Geographic coordinates
|
|
@@ -26,7 +27,41 @@ UrbanFusion can generate **location encodings** from *any subset* of the followi
|
|
| 26 |
- 🗺️ OSM basemaps
|
| 27 |
- 🏬 Points of interest (POIs)
|
| 28 |
|
| 29 |
-
🔗 The full **source code** is available on [GitHub](https://github.com/DominikM198/UrbanFusion)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 30 |
|
| 31 |
---
|
| 32 |
|
|
@@ -37,7 +72,9 @@ UrbanFusion can generate **location encodings** from *any subset* of the followi
|
|
| 37 |
title = {UrbanFusion: Stochastic Multimodal Fusion for Contrastive Learning of Robust Spatial Representations},
|
| 38 |
author = {Dominik J. Mühlematter and Lin Che and Ye Hong and Martin Raubal and Nina Wiedemann},
|
| 39 |
year = {2025},
|
| 40 |
-
journal = {arXiv preprint arXiv:
|
|
|
|
|
|
|
|
|
|
| 41 |
}
|
| 42 |
-
```
|
| 43 |
-
---
|
|
|
|
| 1 |
---
|
|
|
|
|
|
|
|
|
|
| 2 |
base_model:
|
| 3 |
- openai/clip-vit-large-patch14
|
| 4 |
- BAAI/bge-small-en-v1.5
|
| 5 |
- torchgeo/vit_small_patch16_224_sentinel2_all_moco
|
| 6 |
- DominikM198/OSM-MAE
|
| 7 |
+
datasets:
|
| 8 |
+
- DominikM198/PP2-M
|
| 9 |
+
license: cc-by-4.0
|
| 10 |
+
pipeline_tag: feature-extraction
|
| 11 |
tags:
|
| 12 |
- SpatialRepresentationLearning
|
| 13 |
- GeoFoundationModel
|
|
|
|
| 15 |
- ContrastiveLearning
|
| 16 |
- Mutlimodal
|
| 17 |
---
|
| 18 |
+
|
| 19 |
# UrbanFusion: Stochastic Multimodal Fusion for Contrastive Learning of Robust Spatial Representations
|
| 20 |
|
| 21 |
+
This repository provides the **pretrained weights** of the **UrbanFusion** model — a framework for learning robust spatial representations through stochastic multimodal fusion, as presented in the paper [UrbanFusion: Stochastic Multimodal Fusion for Contrastive Learning of Robust Spatial Representations](https://huggingface.co/papers/2510.13774).
|
| 22 |
|
| 23 |
UrbanFusion can generate **location encodings** from *any subset* of the following modalities:
|
| 24 |
- 📍 Geographic coordinates
|
|
|
|
| 27 |
- 🗺️ OSM basemaps
|
| 28 |
- 🏬 Points of interest (POIs)
|
| 29 |
|
| 30 |
+
🔗 The full **source code** is available on [GitHub](https://github.com/DominikM198/UrbanFusion).
|
| 31 |
+
|
| 32 |
+
---
|
| 33 |
+
|
| 34 |
+
## Minimal Usage Example
|
| 35 |
+
Using pretrained models for location encoding is straightforward. The example below demonstrates how to load the model and generate representations based solely on geographic coordinates (latitude and longitude), without requiring any additional input modalities.
|
| 36 |
+
```python
|
| 37 |
+
import torch
|
| 38 |
+
from huggingface_hub import hf_hub_download
|
| 39 |
+
from srl.multi_modal_encoder.load import get_urbanfusion
|
| 40 |
+
|
| 41 |
+
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
|
| 42 |
+
|
| 43 |
+
# Coordinates: batch of 32 (lat, lon) pairs
|
| 44 |
+
coords = torch.randn(32, 2).to(device)
|
| 45 |
+
|
| 46 |
+
# Placeholders for other modalities (SV, RS, OSM, POI)
|
| 47 |
+
placeholder = torch.empty(32).to(device)
|
| 48 |
+
inputs = [coords, placeholder, placeholder, placeholder, placeholder]
|
| 49 |
+
|
| 50 |
+
# Mask all but coordinates (indices: 0=coords, 1=SV, 2=RS, 3=OSM, 4=POI)
|
| 51 |
+
mask_indices = [1, 2, 3, 4]
|
| 52 |
+
|
| 53 |
+
# Load pretrained UrbanFusion model
|
| 54 |
+
ckpt = hf_hub_download("DominikM198/UrbanFusion", "UrbanFusion/UrbanFusion.ckpt")
|
| 55 |
+
model = get_urbanfusion(ckpt, device=device).eval()
|
| 56 |
+
|
| 57 |
+
# Encode inputs (output shape: [32, 768])
|
| 58 |
+
with torch.no_grad():
|
| 59 |
+
embeddings = model(inputs, mask_indices=mask_indices, return_representations=True).cpu()
|
| 60 |
+
```
|
| 61 |
+
For a more comprehensive guide—including instructions on applying the model to downstream tasks and incorporating additional modalities (with options for downloading, preprocessing, and using contextual prompts with or without precomputed features)—see the following tutorials:
|
| 62 |
+
|
| 63 |
+
- [`UrbanFusion_coordinates_only.ipynb`](https://github.com/DominikM198/UrbanFusion/blob/main/tutorials/UrbanFusion_coordinates_only.ipynb)
|
| 64 |
+
- [`UrbanFusion_multimodal.ipynb`](https://github.com/DominikM198/UrbanFusion/blob/main/tutorials/UrbanFusion_multimodal.ipynb)
|
| 65 |
|
| 66 |
---
|
| 67 |
|
|
|
|
| 72 |
title = {UrbanFusion: Stochastic Multimodal Fusion for Contrastive Learning of Robust Spatial Representations},
|
| 73 |
author = {Dominik J. Mühlematter and Lin Che and Ye Hong and Martin Raubal and Nina Wiedemann},
|
| 74 |
year = {2025},
|
| 75 |
+
journal = {arXiv preprint arXiv:2510.13774},
|
| 76 |
+
eprint = {2510.13774},
|
| 77 |
+
archivePrefix = {arXiv},
|
| 78 |
+
url = {https://arxiv.org/abs/2510.13774},
|
| 79 |
}
|
| 80 |
+
```
|
|
|