geospot-base / README.md

sdan

Upload folder using huggingface_hub

32f164c verified 6 days ago

preview code

raw

history blame contribute delete

980 Bytes

metadata

license: apache-2.0
tags:
  - geolocation
  - vision
  - siglip
  - clip
  - geoclip
datasets:
  - osv5m
pipeline_tag: image-feature-extraction

GeoSpot Base

A geolocation model built on SigLIP2-so400m (512px) that predicts GPS coordinates from images.

Model Details

Backbone: google/siglip2-so400m-patch16-512 (frozen)
Image Resolution: 512x512
Embedding Dim: 512
Training Steps: 206k
Training Data: ~10.6M streetview images

Architecture

GeoCLIP-style contrastive learning between:

Image Encoder: SigLIP2 vision tower + MLP projection (1152 → 512)
Location Encoder: Multi-scale RFF encoding with learnable capsules

Usage

from geoclip.model.GeoCLIP import GeoCLIP
import torch

model = GeoCLIP(from_pretrained=False, encoder_name="siglip2")
state_dict = torch.load("model.safetensors")
model.load_state_dict(state_dict)

# Predict location from image
top_gps, top_probs = model.predict("image.jpg", top_k=5)