geospot-base / README.md
sdan's picture
Upload folder using huggingface_hub
32f164c verified
---
license: apache-2.0
tags:
- geolocation
- vision
- siglip
- clip
- geoclip
datasets:
- osv5m
pipeline_tag: image-feature-extraction
---
# GeoSpot Base
A geolocation model built on SigLIP2-so400m (512px) that predicts GPS coordinates from images.
## Model Details
- **Backbone**: google/siglip2-so400m-patch16-512 (frozen)
- **Image Resolution**: 512x512
- **Embedding Dim**: 512
- **Training Steps**: 206k
- **Training Data**: ~10.6M streetview images
## Architecture
GeoCLIP-style contrastive learning between:
- **Image Encoder**: SigLIP2 vision tower + MLP projection (1152 → 512)
- **Location Encoder**: Multi-scale RFF encoding with learnable capsules
## Usage
```python
from geoclip.model.GeoCLIP import GeoCLIP
import torch
model = GeoCLIP(from_pretrained=False, encoder_name="siglip2")
state_dict = torch.load("model.safetensors")
model.load_state_dict(state_dict)
# Predict location from image
top_gps, top_probs = model.predict("image.jpg", top_k=5)
```