sdan
/

geospot-base

Image Feature Extraction

Model card Files Files and versions

geospot-base / README.md

sdan's picture

Upload folder using huggingface_hub

32f164c verified 6 days ago

|

history blame contribute delete

980 Bytes

	---
	license: apache-2.0
	tags:
	- geolocation
	- vision
	- siglip
	- clip
	- geoclip
	datasets:
	- osv5m
	pipeline_tag: image-feature-extraction
	---

	# GeoSpot Base

	A geolocation model built on SigLIP2-so400m (512px) that predicts GPS coordinates from images.

	## Model Details

	- Backbone: google/siglip2-so400m-patch16-512 (frozen)
	- Image Resolution: 512x512
	- Embedding Dim: 512
	- Training Steps: 206k
	- Training Data: ~10.6M streetview images

	## Architecture

	GeoCLIP-style contrastive learning between:
	- Image Encoder: SigLIP2 vision tower + MLP projection (1152 → 512)
	- Location Encoder: Multi-scale RFF encoding with learnable capsules

	## Usage

	```python
	from geoclip.model.GeoCLIP import GeoCLIP
	import torch

	model = GeoCLIP(from_pretrained=False, encoder_name="siglip2")
	state_dict = torch.load("model.safetensors")
	model.load_state_dict(state_dict)

	# Predict location from image
	top_gps, top_probs = model.predict("image.jpg", top_k=5)
	```