Papers
arxiv:2501.02103

Using LSDB to enable large-scale catalog distribution, cross-matching, and analytics

Published on Oct 22, 2025
Authors:
,
,
,
,
,
,
,
,
,
,
,

Abstract

The Vera C. Rubin Observatory's massive data challenges are being addressed through the HATS hierarchical tiling scheme and LSDB analysis package, which enable efficient parallel processing and scalable catalog analysis across multiple environments.

The Vera C. Rubin Observatory will generate an unprecedented volume of data, including approximately 60 petabytes of raw data and around 30 trillion observed sources, posing a significant challenge for large-scale and end-user scientific analysis. As part of the LINCC Frameworks Project we are addressing these challenges with the development of the HATS (Hierarchical Adaptive Tiling Scheme) format and analysis package LSDB. HATS partitions data adaptively using a hierarchical tiling system to balance the file sizes, enabling efficient parallel analysis. Recent updates include improved metadata consistency, support for incremental updates, and enhanced compatibility with evolving datasets. LSDB complements HATS by providing a scalable, user-friendly interface for large catalog analysis, integrating spatial queries, crossmatching, and time-series tools while utilizing Dask for parallelization. We have successfully demonstrated the use of these tools with datasets such as ZTF and Pan-STARRS data releases on both cluster and cloud environments. We are deeply involved in several ongoing collaborations to ensure alignment with community needs, with future plans for IVOA standardization and support for upcoming Rubin, Euclid and Roman data. We provide our code and materials at lsdb.io.

Community

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2501.02103
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2501.02103 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2501.02103 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2501.02103 in a Space README.md to link it from this page.

Collections including this paper 1