Image-to-Text
Transformers
PyTorch
English
Geo-Localization
ReGeo / README.md
kevinloeffler's picture
add link to demo
13e841d verified
---
datasets:
- paidaixing/Image_mapillary_Street_level
- Jia-py/MP16-Pro
language:
- en
base_model:
- openai/clip-vit-large-patch14
tags:
- Geo-Localization
library_name: transformers
pipeline_tag: image-to-text
---
# ReGeo – A Direct Regression Approach for Global Image Geo-Localization
This paper presents a novel approach to Geo-Localization, a task
that aims to predict geographic coordinates, i.e., latitude and
longitude of an image based on its visual content. Traditional
methods in this domain often rely on databases,
complex pipelines or large-scale image classification networks.
In contrast, we propose a direct regression approach that
simplifies the process by predicting the geographic coordinates
directly from the image features. We leverage a pre-trained
Vision Transformer (ViT) model, specifically a pre-trained CLIP
model, for feature extraction and introduce a regression head
for coordinate prediction. Various configurations, including pre-
training and task-specific adaptations, are tested and evaluated
resulting in our model called ReGeo. Experimental results show
that ReGeo offers competitive performance compared to existing
SOTA approaches, despite being simpler and needing minimal
supporting code pipelines.
- **Demo:** [Geo Guesser Game (german)](https://geo-guesser.llmhub.infs.ch/vs-ai)
## Model Details
- **Developed by:** Tobias Rothlin, tobias.rothlin@ost.ch
- **Supervisor:** Mitra Purandare, mitra.purandare@ost.ch
- **Model Card author:** Kevin Löffler, kevin.loeffler@ost.ch
## How to Get Started with the Model
Example inference:
```python
# imports
import torch
from PIL import Image
from model import LocationDecoder # ReGeo model class: https://github.com/TobiasRothlin/GeoLocalization/blob/main/src/DGX1/src/RegressionPretraining/Model.py
from transformers import CLIPProcessor
# load custom config (do not use AutoConfig), an example can be found in this repo
config = { ... }
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
preprocessor = CLIPProcessor.from_pretrained('openai/clip-vit-large-patch14-336')
model = LocationDecoder.from_pretrained('OSTswiss/ReGeo', config=config)
# Load model for inference
model.to(device)
model.eval()
# load image
image_path = 'path/to/your/image.jpg' # can be any size
image = Image.open(image_path)
model_input = preprocessor(images=image, return_tensors="pt")
pixel_values = model_input['pixel_values'].to(device)
# run inference
with torch.no_grad():
output = model(pixel_values)
normal_coordinates = output.squeeze().tolist()
latitude = normal_coordinates[0] * 90
longitude = normal_coordinates[1] * 180
```