Zero-Shot Image Classification
Transformers
PyTorch
English
clip
geolocalization
geolocation
geographic
street
climate
urban
rural
multi-modal
geoguessr
Instructions to use geolocal/StreetCLIP with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use geolocal/StreetCLIP with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("zero-shot-image-classification", model="geolocal/StreetCLIP") pipe( "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/parrots.png", candidate_labels=["animals", "humans", "landscape"], )# Load model directly from transformers import AutoProcessor, AutoModelForZeroShotImageClassification processor = AutoProcessor.from_pretrained("geolocal/StreetCLIP") model = AutoModelForZeroShotImageClassification.from_pretrained("geolocal/StreetCLIP") - Notebooks
- Google Colab
- Kaggle
Update README.md
Browse files
README.md
CHANGED
|
@@ -162,7 +162,7 @@ StreetCLIP was evaluated in zero-shot on two open-domain image geolocalization b
|
|
| 162 |
technique called hierarchical linear probing. Hierarchical linear probing sequentially attempts to
|
| 163 |
identify the correct country and then city of geographical image origin.
|
| 164 |
|
| 165 |
-
## Testing Data
|
| 166 |
|
| 167 |
### Testing Data
|
| 168 |
|
|
@@ -180,13 +180,24 @@ to the ground truth coordinates and then looks at what percentage of error dista
|
|
| 180 |
|
| 181 |
## Results
|
| 182 |
|
| 183 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 184 |
|
| 185 |
### Summary
|
| 186 |
|
| 187 |
Our experiments demonstrate that our synthetic caption pretraining method is capable of significantly
|
| 188 |
improving CLIP's generalized zero-shot capabilities applied to open-domain image geolocalization while
|
| 189 |
-
achieving
|
| 190 |
|
| 191 |
# Environmental Impact
|
| 192 |
|
|
|
|
| 162 |
technique called hierarchical linear probing. Hierarchical linear probing sequentially attempts to
|
| 163 |
identify the correct country and then city of geographical image origin.
|
| 164 |
|
| 165 |
+
## Testing Data and Metrics
|
| 166 |
|
| 167 |
### Testing Data
|
| 168 |
|
|
|
|
| 180 |
|
| 181 |
## Results
|
| 182 |
|
| 183 |
+
**IM2GPS**
|
| 184 |
+
|
| 185 |
+
| | Distance (% @ km) |
|
| 186 |
+
| Model | City | Region | Country | Continent |
|
| 187 |
+
| | 25km | 200km | 750km | 2,500km |
|
| 188 |
+
|----------|:-------------:|:------:|:------:|:------:|
|
| 189 |
+
| PlaNet (2016) | 24.5 | 37.6 | 53.6 | 71.3 |
|
| 190 |
+
| ISNs (2018) | 43.0 | 51.9 | 66.7 | 80.2 |
|
| 191 |
+
| TransLocator (2022) | **48.1** | **64.6** | **75.6** | 86.7 |
|
| 192 |
+
| **Zero-Shot CLIP (ours)** | 27.0 | 42.2 | 71.7 | 86.9 |
|
| 193 |
+
| **Zero-Shot StreetCLIP (ours)** | 28.3 | 45.1 | 74.7 | **88.2** |
|
| 194 |
+
|
| 195 |
|
| 196 |
### Summary
|
| 197 |
|
| 198 |
Our experiments demonstrate that our synthetic caption pretraining method is capable of significantly
|
| 199 |
improving CLIP's generalized zero-shot capabilities applied to open-domain image geolocalization while
|
| 200 |
+
achieving state-of-the-art performance on a selection of benchmark metrics.
|
| 201 |
|
| 202 |
# Environmental Impact
|
| 203 |
|