Improve model card: pipeline tag, paper link, and usage example
#1
by
nielsr
HF Staff
- opened
README.md
CHANGED
|
@@ -1,13 +1,13 @@
|
|
| 1 |
---
|
| 2 |
-
license: cc-by-4.0
|
| 3 |
-
datasets:
|
| 4 |
-
- DominikM198/PP2-M
|
| 5 |
base_model:
|
| 6 |
- openai/clip-vit-large-patch14
|
| 7 |
- BAAI/bge-small-en-v1.5
|
| 8 |
- torchgeo/vit_small_patch16_224_sentinel2_all_moco
|
| 9 |
- DominikM198/OSM-MAE
|
| 10 |
-
|
|
|
|
|
|
|
|
|
|
| 11 |
tags:
|
| 12 |
- SpatialRepresentationLearning
|
| 13 |
- GeoFoundationModel
|
|
@@ -15,9 +15,10 @@ tags:
|
|
| 15 |
- ContrastiveLearning
|
| 16 |
- Mutlimodal
|
| 17 |
---
|
|
|
|
| 18 |
# UrbanFusion: Stochastic Multimodal Fusion for Contrastive Learning of Robust Spatial Representations
|
| 19 |
|
| 20 |
-
This repository provides the **pretrained weights** of the **UrbanFusion** model — a framework for learning robust spatial representations through stochastic multimodal fusion.
|
| 21 |
|
| 22 |
UrbanFusion can generate **location encodings** from *any subset* of the following modalities:
|
| 23 |
- 📍 Geographic coordinates
|
|
@@ -26,7 +27,41 @@ UrbanFusion can generate **location encodings** from *any subset* of the followi
|
|
| 26 |
- 🗺️ OSM basemaps
|
| 27 |
- 🏬 Points of interest (POIs)
|
| 28 |
|
| 29 |
-
🔗 The full **source code** is available on [GitHub](https://github.com/DominikM198/UrbanFusion)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 30 |
|
| 31 |
---
|
| 32 |
|
|
@@ -37,7 +72,9 @@ UrbanFusion can generate **location encodings** from *any subset* of the followi
|
|
| 37 |
title = {UrbanFusion: Stochastic Multimodal Fusion for Contrastive Learning of Robust Spatial Representations},
|
| 38 |
author = {Dominik J. Mühlematter and Lin Che and Ye Hong and Martin Raubal and Nina Wiedemann},
|
| 39 |
year = {2025},
|
| 40 |
-
journal = {arXiv preprint arXiv:
|
|
|
|
|
|
|
|
|
|
| 41 |
}
|
| 42 |
-
```
|
| 43 |
-
---
|
|
|
|
| 1 |
---
|
|
|
|
|
|
|
|
|
|
| 2 |
base_model:
|
| 3 |
- openai/clip-vit-large-patch14
|
| 4 |
- BAAI/bge-small-en-v1.5
|
| 5 |
- torchgeo/vit_small_patch16_224_sentinel2_all_moco
|
| 6 |
- DominikM198/OSM-MAE
|
| 7 |
+
datasets:
|
| 8 |
+
- DominikM198/PP2-M
|
| 9 |
+
license: cc-by-4.0
|
| 10 |
+
pipeline_tag: feature-extraction
|
| 11 |
tags:
|
| 12 |
- SpatialRepresentationLearning
|
| 13 |
- GeoFoundationModel
|
|
|
|
| 15 |
- ContrastiveLearning
|
| 16 |
- Mutlimodal
|
| 17 |
---
|
| 18 |
+
|
| 19 |
# UrbanFusion: Stochastic Multimodal Fusion for Contrastive Learning of Robust Spatial Representations
|
| 20 |
|
| 21 |
+
This repository provides the **pretrained weights** of the **UrbanFusion** model — a framework for learning robust spatial representations through stochastic multimodal fusion, as presented in the paper [UrbanFusion: Stochastic Multimodal Fusion for Contrastive Learning of Robust Spatial Representations](https://huggingface.co/papers/2510.13774).
|
| 22 |
|
| 23 |
UrbanFusion can generate **location encodings** from *any subset* of the following modalities:
|
| 24 |
- 📍 Geographic coordinates
|
|
|
|
| 27 |
- 🗺️ OSM basemaps
|
| 28 |
- 🏬 Points of interest (POIs)
|
| 29 |
|
| 30 |
+
🔗 The full **source code** is available on [GitHub](https://github.com/DominikM198/UrbanFusion).
|
| 31 |
+
|
| 32 |
+
---
|
| 33 |
+
|
| 34 |
+
## Minimal Usage Example
|
| 35 |
+
Using pretrained models for location encoding is straightforward. The example below demonstrates how to load the model and generate representations based solely on geographic coordinates (latitude and longitude), without requiring any additional input modalities.
|
| 36 |
+
```python
|
| 37 |
+
import torch
|
| 38 |
+
from huggingface_hub import hf_hub_download
|
| 39 |
+
from srl.multi_modal_encoder.load import get_urbanfusion
|
| 40 |
+
|
| 41 |
+
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
|
| 42 |
+
|
| 43 |
+
# Coordinates: batch of 32 (lat, lon) pairs
|
| 44 |
+
coords = torch.randn(32, 2).to(device)
|
| 45 |
+
|
| 46 |
+
# Placeholders for other modalities (SV, RS, OSM, POI)
|
| 47 |
+
placeholder = torch.empty(32).to(device)
|
| 48 |
+
inputs = [coords, placeholder, placeholder, placeholder, placeholder]
|
| 49 |
+
|
| 50 |
+
# Mask all but coordinates (indices: 0=coords, 1=SV, 2=RS, 3=OSM, 4=POI)
|
| 51 |
+
mask_indices = [1, 2, 3, 4]
|
| 52 |
+
|
| 53 |
+
# Load pretrained UrbanFusion model
|
| 54 |
+
ckpt = hf_hub_download("DominikM198/UrbanFusion", "UrbanFusion/UrbanFusion.ckpt")
|
| 55 |
+
model = get_urbanfusion(ckpt, device=device).eval()
|
| 56 |
+
|
| 57 |
+
# Encode inputs (output shape: [32, 768])
|
| 58 |
+
with torch.no_grad():
|
| 59 |
+
embeddings = model(inputs, mask_indices=mask_indices, return_representations=True).cpu()
|
| 60 |
+
```
|
| 61 |
+
For a more comprehensive guide—including instructions on applying the model to downstream tasks and incorporating additional modalities (with options for downloading, preprocessing, and using contextual prompts with or without precomputed features)—see the following tutorials:
|
| 62 |
+
|
| 63 |
+
- [`UrbanFusion_coordinates_only.ipynb`](https://github.com/DominikM198/UrbanFusion/blob/main/tutorials/UrbanFusion_coordinates_only.ipynb)
|
| 64 |
+
- [`UrbanFusion_multimodal.ipynb`](https://github.com/DominikM198/UrbanFusion/blob/main/tutorials/UrbanFusion_multimodal.ipynb)
|
| 65 |
|
| 66 |
---
|
| 67 |
|
|
|
|
| 72 |
title = {UrbanFusion: Stochastic Multimodal Fusion for Contrastive Learning of Robust Spatial Representations},
|
| 73 |
author = {Dominik J. Mühlematter and Lin Che and Ye Hong and Martin Raubal and Nina Wiedemann},
|
| 74 |
year = {2025},
|
| 75 |
+
journal = {arXiv preprint arXiv:2510.13774},
|
| 76 |
+
eprint = {2510.13774},
|
| 77 |
+
archivePrefix = {arXiv},
|
| 78 |
+
url = {https://arxiv.org/abs/2510.13774},
|
| 79 |
}
|
| 80 |
+
```
|
|
|