Image-to-Text
Transformers
PyTorch
English
Geo-Localization
File size: 2,660 Bytes
aad006a
 
 
 
 
 
 
 
 
 
736cba2
3b86aa7
8c0a182
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13e841d
8c0a182
 
 
 
 
 
 
 
 
 
 
 
 
ce6e624
c9ead0f
 
 
91ab0ec
c9ead0f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3c67fc0
c9ead0f
 
 
 
 
8c0a182
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
---
datasets:
- paidaixing/Image_mapillary_Street_level
- Jia-py/MP16-Pro
language:
- en
base_model:
- openai/clip-vit-large-patch14
tags:
- Geo-Localization
library_name: transformers
pipeline_tag: image-to-text
---
# ReGeo – A Direct Regression Approach for Global Image Geo-Localization

This paper presents a novel approach to Geo-Localization, a task
that aims to predict geographic coordinates, i.e., latitude and
longitude of an image based on its visual content. Traditional 
methods in this domain often rely on databases,
complex pipelines or large-scale image classification networks.
In contrast, we propose a direct regression approach that
simplifies the process by predicting the geographic coordinates
directly from the image features. We leverage a pre-trained
Vision Transformer (ViT) model, specifically a pre-trained CLIP
model, for feature extraction and introduce a regression head
for coordinate prediction. Various configurations, including pre-
training and task-specific adaptations, are tested and evaluated
resulting in our model called ReGeo. Experimental results show
that ReGeo offers competitive performance compared to existing
SOTA approaches, despite being simpler and needing minimal
supporting code pipelines.

- **Demo:** [Geo Guesser Game (german)](https://geo-guesser.llmhub.infs.ch/vs-ai)


## Model Details

- **Developed by:** Tobias Rothlin, tobias.rothlin@ost.ch
- **Supervisor:** Mitra Purandare, mitra.purandare@ost.ch
- **Model Card author:** Kevin Löffler, kevin.loeffler@ost.ch


## How to Get Started with the Model

Example inference:

```python
# imports
import torch
from PIL import Image
from model import LocationDecoder  # ReGeo model class: https://github.com/TobiasRothlin/GeoLocalization/blob/main/src/DGX1/src/RegressionPretraining/Model.py
from transformers import CLIPProcessor

# load custom config (do not use AutoConfig), an example can be found in this repo
config = { ... }

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
preprocessor = CLIPProcessor.from_pretrained('openai/clip-vit-large-patch14-336')
model = LocationDecoder.from_pretrained('OSTswiss/ReGeo', config=config)

# Load model for inference
model.to(device)
model.eval()

# load image
image_path = 'path/to/your/image.jpg'  # can be any size
image = Image.open(image_path)
model_input = preprocessor(images=image, return_tensors="pt")
pixel_values = model_input['pixel_values'].to(device)

# run inference
with torch.no_grad():
    output = model(pixel_values)
    normal_coordinates = output.squeeze().tolist()
    latitude = normal_coordinates[0] * 90
    longitude = normal_coordinates[1] * 180
```