Improve model card: pipeline tag, paper link, and usage example

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +46 -9
README.md CHANGED
@@ -1,13 +1,13 @@
1
  ---
2
- license: cc-by-4.0
3
- datasets:
4
- - DominikM198/PP2-M
5
  base_model:
6
  - openai/clip-vit-large-patch14
7
  - BAAI/bge-small-en-v1.5
8
  - torchgeo/vit_small_patch16_224_sentinel2_all_moco
9
  - DominikM198/OSM-MAE
10
- pipeline_tag: any-to-any
 
 
 
11
  tags:
12
  - SpatialRepresentationLearning
13
  - GeoFoundationModel
@@ -15,9 +15,10 @@ tags:
15
  - ContrastiveLearning
16
  - Mutlimodal
17
  ---
 
18
  # UrbanFusion: Stochastic Multimodal Fusion for Contrastive Learning of Robust Spatial Representations
19
 
20
- This repository provides the **pretrained weights** of the **UrbanFusion** model — a framework for learning robust spatial representations through stochastic multimodal fusion.
21
 
22
  UrbanFusion can generate **location encodings** from *any subset* of the following modalities:
23
  - 📍 Geographic coordinates
@@ -26,7 +27,41 @@ UrbanFusion can generate **location encodings** from *any subset* of the followi
26
  - 🗺️ OSM basemaps
27
  - 🏬 Points of interest (POIs)
28
 
29
- 🔗 The full **source code** is available on [GitHub](https://github.com/DominikM198/UrbanFusion), and further details are described in our paper.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
30
 
31
  ---
32
 
@@ -37,7 +72,9 @@ UrbanFusion can generate **location encodings** from *any subset* of the followi
37
  title = {UrbanFusion: Stochastic Multimodal Fusion for Contrastive Learning of Robust Spatial Representations},
38
  author = {Dominik J. Mühlematter and Lin Che and Ye Hong and Martin Raubal and Nina Wiedemann},
39
  year = {2025},
40
- journal = {arXiv preprint arXiv:xxxx.xxxxx}
 
 
 
41
  }
42
- ```
43
- ---
 
1
  ---
 
 
 
2
  base_model:
3
  - openai/clip-vit-large-patch14
4
  - BAAI/bge-small-en-v1.5
5
  - torchgeo/vit_small_patch16_224_sentinel2_all_moco
6
  - DominikM198/OSM-MAE
7
+ datasets:
8
+ - DominikM198/PP2-M
9
+ license: cc-by-4.0
10
+ pipeline_tag: feature-extraction
11
  tags:
12
  - SpatialRepresentationLearning
13
  - GeoFoundationModel
 
15
  - ContrastiveLearning
16
  - Mutlimodal
17
  ---
18
+
19
  # UrbanFusion: Stochastic Multimodal Fusion for Contrastive Learning of Robust Spatial Representations
20
 
21
+ This repository provides the **pretrained weights** of the **UrbanFusion** model — a framework for learning robust spatial representations through stochastic multimodal fusion, as presented in the paper [UrbanFusion: Stochastic Multimodal Fusion for Contrastive Learning of Robust Spatial Representations](https://huggingface.co/papers/2510.13774).
22
 
23
  UrbanFusion can generate **location encodings** from *any subset* of the following modalities:
24
  - 📍 Geographic coordinates
 
27
  - 🗺️ OSM basemaps
28
  - 🏬 Points of interest (POIs)
29
 
30
+ 🔗 The full **source code** is available on [GitHub](https://github.com/DominikM198/UrbanFusion).
31
+
32
+ ---
33
+
34
+ ## Minimal Usage Example
35
+ Using pretrained models for location encoding is straightforward. The example below demonstrates how to load the model and generate representations based solely on geographic coordinates (latitude and longitude), without requiring any additional input modalities.
36
+ ```python
37
+ import torch
38
+ from huggingface_hub import hf_hub_download
39
+ from srl.multi_modal_encoder.load import get_urbanfusion
40
+
41
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
42
+
43
+ # Coordinates: batch of 32 (lat, lon) pairs
44
+ coords = torch.randn(32, 2).to(device)
45
+
46
+ # Placeholders for other modalities (SV, RS, OSM, POI)
47
+ placeholder = torch.empty(32).to(device)
48
+ inputs = [coords, placeholder, placeholder, placeholder, placeholder]
49
+
50
+ # Mask all but coordinates (indices: 0=coords, 1=SV, 2=RS, 3=OSM, 4=POI)
51
+ mask_indices = [1, 2, 3, 4]
52
+
53
+ # Load pretrained UrbanFusion model
54
+ ckpt = hf_hub_download("DominikM198/UrbanFusion", "UrbanFusion/UrbanFusion.ckpt")
55
+ model = get_urbanfusion(ckpt, device=device).eval()
56
+
57
+ # Encode inputs (output shape: [32, 768])
58
+ with torch.no_grad():
59
+ embeddings = model(inputs, mask_indices=mask_indices, return_representations=True).cpu()
60
+ ```
61
+ For a more comprehensive guide—including instructions on applying the model to downstream tasks and incorporating additional modalities (with options for downloading, preprocessing, and using contextual prompts with or without precomputed features)—see the following tutorials:
62
+
63
+ - [`UrbanFusion_coordinates_only.ipynb`](https://github.com/DominikM198/UrbanFusion/blob/main/tutorials/UrbanFusion_coordinates_only.ipynb)
64
+ - [`UrbanFusion_multimodal.ipynb`](https://github.com/DominikM198/UrbanFusion/blob/main/tutorials/UrbanFusion_multimodal.ipynb)
65
 
66
  ---
67
 
 
72
  title = {UrbanFusion: Stochastic Multimodal Fusion for Contrastive Learning of Robust Spatial Representations},
73
  author = {Dominik J. Mühlematter and Lin Che and Ye Hong and Martin Raubal and Nina Wiedemann},
74
  year = {2025},
75
+ journal = {arXiv preprint arXiv:2510.13774},
76
+ eprint = {2510.13774},
77
+ archivePrefix = {arXiv},
78
+ url = {https://arxiv.org/abs/2510.13774},
79
  }
80
+ ```