geospatial
davanstrien HF Staff commited on
Commit
5b4a0af
·
verified ·
1 Parent(s): 71b2223

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +67 -25
README.md CHANGED
@@ -3,38 +3,36 @@ tags:
3
  - geospatial
4
  ---
5
 
 
6
 
7
- # Model Card for SatCLIP-ResNet18-L10
8
-
9
- <!-- Provide a quick summary of what the model is/does. -->
10
-
11
- Models described in https://github.com/microsoft/satclip/
12
 
13
  ## Model Details
14
 
15
  ### Model Description
16
 
17
- <!-- Provide a longer summary of what this model is. -->
 
 
 
 
18
 
 
19
 
 
 
20
 
21
- - **Developed by:** [More Information Needed]
22
- - **Funded by [optional]:** [More Information Needed]
23
- - **Shared by [optional]:** [More Information Needed]
24
- - **Model type:** [More Information Needed]
25
- - **Language(s) (NLP):** [More Information Needed]
26
- - **License:** [More Information Needed]
27
- - **Finetuned from model [optional]:** [More Information Needed]
28
 
29
- ### Model Sources [optional]
30
 
31
- <!-- Provide the basic links for the model. -->
32
 
33
- - **Repository:** [More Information Needed]
34
- - **Paper [optional]:** [More Information Needed]
35
- - **Demo [optional]:** [More Information Needed]
36
 
37
- ## Use the encoder
 
 
38
 
39
  ```python
40
  from huggingface_hub import hf_hub_download
@@ -46,7 +44,7 @@ device = "cuda" if torch.cuda.is_available() else "cpu"
46
  c = torch.randn(32, 2) # Represents a batch of 32 locations (lon/lat)
47
 
48
  model = get_satclip(
49
- hf_hub_download("davanstrien/SatCLIP-ResNet18-L10", "satclip-resnet18-l10.ckpt"),
50
  device=device,
51
  ) # Only loads location encoder by default
52
  model.eval()
@@ -55,14 +53,58 @@ with torch.no_grad():
55
  ```
56
 
57
 
58
- ## Citation
59
 
60
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
 
 
 
61
 
62
- **BibTeX:**
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
63
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
64
  @article{klemmer2023satclip,
65
  title={SatCLIP: Global, General-Purpose Location Embeddings with Satellite Imagery},
66
- author={Klemmer, Konstantin and Rolf, Esther and Robinson, Caleb and Mackey, Lester and Ru{\ss}wurm, Marc},
67
- journal={arXiv preprint arXiv:2311.17179},
68
  year={2023}
 
 
 
 
 
 
 
3
  - geospatial
4
  ---
5
 
6
+ # Model Card for SatCLIP
7
 
8
+ Here we provide a accompanying information about our model SatCLIP.
 
 
 
 
9
 
10
  ## Model Details
11
 
12
  ### Model Description
13
 
14
+ SatCLIP is a model for contrastive pretraining of satellite image-location pairs. Training is analogous to the popular [CLIP](https://github.com/openai/CLIP) model.
15
+
16
+ - **Developed by:** Konstantin Klemmer, Marc Russwurm, Esther Rolf, Caleb Robinson, Lester Mackey
17
+ - **Model type:** Location and image encoder model pretrained using contrastive image-location matching.
18
+ - **License:** MIT
19
 
20
+ ### Model Sources
21
 
22
+ - **Repository:** [github.com/microsoft/satclip](https://github.com/microsoft/satclip)
23
+ - **Paper:** TBA
24
 
25
+ ## Uses
 
 
 
 
 
 
26
 
27
+ SatCLIP includes an *image* and a *location* encoder. The image encoder processes multi-spectral satellite images of size `[height, width, 13]` into `[d]`-dimensional latent vectors. The location encoder processes location coordinates `[longitude, latitude]` into the same `[d]`-dimensional space.
28
 
29
+ SatCLIP is a model trained and tested for use in research projects. It is not intended for use in production environments.
30
 
31
+ ### Downstream Use
 
 
32
 
33
+ The SatCLIP location encoder learns location characteristics, as captured by the satellite images, and can be deployed for downstream geospatial prediction tasks. Practically, this involves *querying* the location encoder for the `[d]`-dimensional vector embedding of all downstream locations and then using that embedding as predictor during downstream learning. In our paper, we show the useability of the learned location embeddings for predicting e.g. population density or biomes.
34
+
35
+ #### Use the encoder
36
 
37
  ```python
38
  from huggingface_hub import hf_hub_download
 
44
  c = torch.randn(32, 2) # Represents a batch of 32 locations (lon/lat)
45
 
46
  model = get_satclip(
47
+ hf_hub_download("microsoft/SatCLIP-ResNet18-L10", "satclip-resnet18-l10.ckpt"),
48
  device=device,
49
  ) # Only loads location encoder by default
50
  model.eval()
 
53
  ```
54
 
55
 
56
+ ### Out-of-Scope Use
57
 
58
+ Potential use cases of SatCLIP which we did build the model for and did not test for include:
59
+ * The SatCLIP image encoder can in theory be used for helping with satellite image localization. If this application interests you, we encourage you to check work focusing on this, e.g. [Cepeda et al. (2023)](https://arxiv.org/abs/2309.16020).
60
+ * Fine-grained geographic problems (i.e. problems constrained to small geographic areas or including many close locations) are out of scope for SatCLIP. SatCLIP location encoders are pretrained for global-scale use.
61
+ * Any use outside of research projects is currently out of scope as we don't evaluate SatCLIP in production environments.
62
 
63
+ ## Bias, Risks, and Limitations
64
+
65
+ The following aspects should be considered before using SatCLIP:
66
+ * SatCLIP is trained with freely available Sentinel-2 satellite imagery with a resolution of 10m per pixel. This allows the model to learn larger structures like cities or mountain ranges, but not small scale structures like individual vehicles or people. SatCLIP models are not applicable for fine-grained geospatial problems.
67
+ * Location embeddings from SatCLIP only capture location characteristics that represent visually in satellite imagery (at our given resolution). Applications in problems that can not be captured through satellite images are out-of-score for SatCLIP.
68
+ * Use cases in the defense or surveillance domain are always out-of-scope regardless of performance of SatCLIP. The use of artificial intelligence for such tasks is premature currently given the lack of testing norms and checks to ensure its fair use.
69
+
70
+ ## How to Get Started with the Model
71
+
72
+ Information about how to get started with SatCLIP training and deployment in downstream modelling can be found in our GitHub repository at [github.com/microsoft/satclip](https://github.com/microsoft/satclip).
73
+
74
+ ## Training Details
75
+
76
+ ### Training Data
77
+
78
+ SatCLIP is trained using the *S2-100K* dataset which samples 100,000 multi-spectral satellite image scenes from Sentinel-2 via the [Microsoft Planetary Computer](https://planetarycomputer.microsoft.com/). Scenes are sampled approximately uniformly over landmass and are only chosen for the dataset if they don't exhibit cloud coverage. More details can be found in our paper.
79
+
80
+ ### Training Procedure
81
 
82
+ SatCLIP is trained via contrastive learning, by matching the correct image-location pairs in a batch of images and locations. Each image and each location is processed within an encoder and trasformed into a `[d]`-dimensional embedding. The training objective is to minimize the cosine similarity of image and location embeddings.
83
+
84
+ #### Training Hyperparameters
85
+
86
+ The key hyperparameters of SatCLIP are: batch size, learning rate and weight decay. On top of this, the specific location and vision encoder come with their separate hyperparameters. Key hyperparameters for the location encoder include resolution-specific hyperparameters in the positional encoding (e.g. number of Legendre polynomials used for spherical harmonics calculation) and the type, number of layers and capacity of the neural network deployed. For the vision encoder, key hyperparameters depend on the type of vision backbone deployed (e.g. ResNet, Vision Transformer). More details can be found in our paper.
87
+
88
+ #### Training Speed
89
+
90
+ Training SatCLIP for 500 epochs using pretrained vision encoders takes aoughly 2 days on a single A100 GPU.
91
+
92
+ ## Evaluation
93
+
94
+ SatCLIP can be evaluated throughout training and during downstream deployment. During training, we log model loss on a held-out, unseen validation set to monitor the training process for potential overfitting. When SatCLIP embeddings are used in downstream applications, any predictive score can be used for evaluation, e.g. mean squared error (MSE) for regression or accuracy for classification problems.
95
+
96
+ ## Citation
97
+
98
+ **BibTeX:**
99
+ ```bibtex
100
  @article{klemmer2023satclip,
101
  title={SatCLIP: Global, General-Purpose Location Embeddings with Satellite Imagery},
102
+ author={Klemmer, Konstantin and Rolf, Esther and Robinson, Caleb and Mackey, Lester and Russwurm, Marc},
103
+ journal={TBA},
104
  year={2023}
105
+ }
106
+ ```
107
+
108
+ ## Model Card Contact
109
+
110
+ For feedback and comments, contact [kklemmer@microsoft.com](mailto:kklemmer@microsoft.com).