SpecCLIP / README.md
nielsr's picture
nielsr HF Staff
Add metadata (license, pipeline tag) and usage examples
7fa636b verified
|
raw
history blame
6.07 kB
metadata
license: mit
pipeline_tag: feature-extraction

🌌 SpecCLIP: Aligning and Translating Spectroscopic Measurements for Stars

arXiv GitHub License: MIT

SpecCLIP is a contrastive + domain-preserving foundation model designed to align LAMOST LRS spectra with Gaia XP spectrophotometric data. It learns a general-purpose spectral embedding (768-dim) that supports:

  • Stellar parameter estimation
  • Cross-survey spectral translation (LAMOST LRS ⟷ Gaia XP)
  • Similarity retrieval across LAMOST LRS and GAIA XP spectra

For full documentation, installation instructions, examples, and end-to-end usage, please visit the GitHub repository: πŸ‘‰ https://github.com/Xiaosheng-Zhao/SpecCLIP


πŸ”§ Available Models

The following pretrained weights are included in this model repository:

File Description Embedding Dim Param
encoders/lrs_encoder.ckpt LAMOST LRS masked transformer encoder 768 43M
encoders/xp_encoder.ckpt Gaia XP masked transformer encoder 768 43M
encoders/xp_encoder_mlp.ckpt Gaia XP autoencoder (MLP head) 768 43M
specclip/specclip_model_base.ckpt Gaia XP ⟷ LAMOST contrastive 768 100M
specclip/specclip_model_predrecon_mlp.ckpt CLIP alignment + pred+recon 768 168M
specclip/specclip_model_split_mlp.ckpt CLIP alignment + split pred/recon 768 126M

🧠 What the Model Does

SpecCLIP consists of:

  • Two masked transformer encoders – LAMOST LRS – Gaia XP
  • Contrastive alignment loss (CLIP-style)
  • Domain-preserving prediction & reconstruction heads
  • Cross-modal decoder for spectrum translation

It produces shared embeddings enabling multi-survey astrophysical analysis.


Sample Usage

The following examples are adapted from the official GitHub repository.

Installation

First, create a conda environment and install requirements:

conda create -n specclip-ai python=3.10
conda activate specclip-ai
conda install pytorch==2.5.1 torchvision==0.20.1 pytorch-cuda=11.8 -c pytorch -c nvidia
conda install numpy==2.0.1 scipy==1.15.3 pandas==2.3.3 mkl mkl-service -c defaults
pip install -r requirements.txt
pip install -e .

Spectral Translation

Predict Gaia XP spectrum from LAMOST LRS:

import json
from spectral_retrieval import SpectralRetriever
from predict_lrs_wclip_v0 import load_spectrum_data

# Configuration
with open('config_retrieval.json', 'r') as f:
    config = json.load(f)
retriever = SpectralRetriever(**config)

# Load the external spectra data
wavelength, flux = load_spectrum_data('./test_data/lrs/sample1_matrix.fits')

# Predict corresponding Gaia XP spectrum
prediction_external = retriever.predict_cross_modal(
    query_spectrum=(wavelength, flux),
    query_type='lamost_spectra'
)

# Plot
retriever.plot_cross_modal_prediction(
    prediction_external,
    save_path='./plots/external_lamost_to_gaia_prediction.png'
)

Spectral Similarity Search

Find the top-4 most similar stars from Gaia XP catalog:

# Download test data only
!python download_and_setup.py --test-data-only

# Build embedding database from test data
retriever.build_embedding_database(batch_size=1000, save_path='./test_embeddings.npz')

# Load external LAMOST spectrum
wavelength, flux = load_spectrum_data('./test_data/lrs/sample1_matrix.fits')

# Find similar Gaia XP spectra
results_external_cross = retriever.find_similar_spectra(
    query_spectrum=(wavelength, flux),
    query_type='lamost_spectra',
    search_type='cross_modal',
    top_k=4
)

# Plot
retriever.plot_retrieval_results(
    results_external_cross,
    save_path='./plots/external_lamost_to_gaia_cross.png'
)

Parameter Prediction

Coming soon. This section will include examples of using SpecCLIP embeddings with downstream models (e.g., MLP, SBI) for stellar-parameter prediction.


πŸ“„ Full Documentation

To keep the Hugging Face card concise, all detailed instructions, including:

  • Installation
  • Parameter prediction
  • Spectral translation
  • Retrieval
  • Full examples (Python + figures)
  • Acknowledgments

are available at the GitHub repo:

πŸ‘‰ https://github.com/Xiaosheng-Zhao/SpecCLIP


πŸ“Š Citation

@ARTICLE{2025arXiv250701939Z,
       author = {{Zhao}, Xiaosheng and {Huang}, Yang and {Xue}, Guirong and {Kong}, Xiao and
                 {Liu}, Jifeng and {Tang}, Xiaoyu and {Beers}, Timothy C. and
                 {Ting}, Yuan-Sen and {Luo}, A-Li},
        title = "{SpecCLIP: Aligning and Translating Spectroscopic Measurements for Stars}",
      journal = {arXiv e-prints},
     keywords = {Instrumentation and Methods for Astrophysics, Solar and Stellar Astrophysics,
                 Artificial Intelligence, Machine Learning},
         year = 2025,
        month = jul,
          eid = {arXiv:2507.01939},
        pages = {arXiv:2507.01939},
          doi = {10.48550/arXiv.250701939},
archivePrefix = {arXiv},
       eprint = {2507.01939},
 primaryClass = {astro-ph.IM},
}

πŸ“¬ Contact