Spaces:
Running
Global City Similarity Search with TerraMind Embeddings
Global City Similarity Search with TerraMind Embeddings
Repository: https://github.com/jankomag/urban-embeddings-explorer/
Live Demo: https://urban-embeddings-explorer.vercel.app/
Contact: jan@magnuszewski.com
What This Is and Motivation
This project explores how TerraMind "sees" urban areas globally by creating a similarity search system that discovers urban patterns across continents. As someone fascinated by cities, urban form, and GeoAI, I wanted to understand how a foundation model perceives cities worldwide and develop an interactive tool to showcase this capability.
This is a passion project I created to learn how to use geospatial foundation models and share something that applies these technologies to the urban domain. This isn't a research project, but rather an exploration of foundation model capabilities and a demonstration of using this technology for urban analysis.
Technical Implementation
The first step was extracting embeddings from the TerraMind model. I decided to use optical imagery from Sentinel-2 L2A with the same 12 bands used in model training. I prepared a dataset of over 200 cities (filtered to cities above 100k population, with a maximum of 5 per country to ensure regional diversity).
I created a pipeline that:
- Extracts imagery from STAC catalogues
- Takes a median composite across data from the most recent months while masking out clouds (using Dask for processing)
- Applies the expected transformations before passing data to the model
This results in 768-dimensional embeddings from 196 patches per 224×224m tile, for a total of over 48,000 tiles globally.
Limiting Spatial Dimensions
Analyzing the similarity from the raw embeddings revealed an interesting challenge: the most similar tiles were often neighbouring tiles within the same city. To discover visually similar areas across different cities, I needed to limit this spatial correlation and focus embeddings on visual features within each tile, regardless of location.
I developed a script that identifies embedding dimensions most strongly correlated with geographic location. The process measures spatial autocorrelation within tiles and correlation between tile proximity and embedding similarity, then uses automatic cutoff detection to identify the most spatially-biased dimensions. This excluded 61 dimensions (7.9%) of the 768 total dimensions.
After this operation, the similarity search became more focused on visual patterns across different cities - highlighting tree-heavy areas, parts of cities with rivers, harbours, industrial zones, and other distinctive urban features that transcend geographic boundaries.
Aggregating Methods
I experimented with different methods for aggregating the 196 raw patch embeddings into one vector per tile for similarity search. Simple operations like mean, median, min, and max showed little variation in results, with median and max perhaps capturing the most useful features. Each method reveals slightly different aspects of urban similarity, though results are broadly consistent across approaches.
One challenge is that in small tiles of 224m × 224m, urban areas can have quite diverse land use, meaning aggregating across all patches would sometimes dilute specific land use signals. To address this, I tried clustering all patches into an optimal number of clusters (usually 3 or 4) and only aggregating - using a simple mean - the patches belonging to the dominant cluster. This helped focus better on one primary area type within a tile, though in other cases it might lose useful information compared to using all patches.
I also experimented with different UMAP parameters for dimensionality reduction to separate tiles in interesting ways. The UMAP visualisation mostly separates cities by regions (desert cities, European cities, coastal areas clustering together), but sometimes captures global visual patterns - there's actually a distinct region in the embedding space where airports from around the world cluster together.
Interactive Web Application
To make these discoveries accessible and engaging, I built a full-stack web application that enables exploration of urban visual similarity. The React frontend integrates a Mapbox component displaying all the tiles used in analysis alongside a UMAP visualisation, allowing users to click any location and discover visually similar urban areas worldwide.
For fast similarity search, the backend queries a Qdrant vector database storing the filtered and aggregated embeddings. I've made two aggregation methods publicly accessible through the web application: mean and dominant cluster. The paired visualisation synchronises between the map and embedding space, providing an intuitive way to explore cities - you can click on a random neighbourhood in Sydney and might discover similar-looking areas in Rio de Janeiro or anywhere else in the world.
Current Status and Future Plans
At the 10m resolution, the embeddings capture broad patterns rather than fine-grained details within tiles, but this still effectively demonstrates how foundation models capture urban characteristics at scale. This is a work in progress that I plan to refine further and conduct more in-depth analysis on the embeddings from urban areas. Similarity search could likely be improved with more advanced aggregation methods and perhaps incorporating additional embedding filtering techniques.
The code, documentation, and interactive demo are all accessible through the GitHub repository. I hope this project demonstrates an interesting application of geospatial embeddings for urban analysis and inspires further work in this domain.

