license: mit
pipeline_tag: feature-extraction
π SpecCLIP: Aligning and Translating Spectroscopic Measurements for Stars
SpecCLIP is a contrastive + domain-preserving foundation model designed to align LAMOST LRS spectra with Gaia XP spectrophotometric data. It learns a general-purpose spectral embedding (768-dim) that supports:
- Stellar parameter estimation
- Cross-survey spectral translation (LAMOST LRS β· Gaia XP)
- Similarity retrieval across LAMOST LRS and GAIA XP spectra
For full documentation, installation instructions, examples, and end-to-end usage, please visit the GitHub repository: π https://github.com/Xiaosheng-Zhao/SpecCLIP
π§ Available Models
The following pretrained weights are included in this model repository:
| File | Description | Embedding Dim | Param |
|---|---|---|---|
encoders/lrs_encoder.ckpt |
LAMOST LRS masked transformer encoder | 768 | 43M |
encoders/xp_encoder.ckpt |
Gaia XP masked transformer encoder | 768 | 43M |
encoders/xp_encoder_mlp.ckpt |
Gaia XP autoencoder (MLP head) | 768 | 43M |
specclip/specclip_model_base.ckpt |
Gaia XP β· LAMOST contrastive | 768 | 100M |
specclip/specclip_model_predrecon_mlp.ckpt |
CLIP alignment + pred+recon | 768 | 168M |
specclip/specclip_model_split_mlp.ckpt |
CLIP alignment + split pred/recon | 768 | 126M |
π§ What the Model Does
SpecCLIP consists of:
- Two masked transformer encoders β LAMOST LRS β Gaia XP
- Contrastive alignment loss (CLIP-style)
- Domain-preserving prediction & reconstruction heads
- Cross-modal decoder for spectrum translation
It produces shared embeddings enabling multi-survey astrophysical analysis.
Sample Usage
The following examples are adapted from the official GitHub repository.
Installation
First, create a conda environment and install requirements:
conda create -n specclip-ai python=3.10
conda activate specclip-ai
conda install pytorch==2.5.1 torchvision==0.20.1 pytorch-cuda=11.8 -c pytorch -c nvidia
conda install numpy==2.0.1 scipy==1.15.3 pandas==2.3.3 mkl mkl-service -c defaults
pip install -r requirements.txt
pip install -e .
Spectral Translation
Predict Gaia XP spectrum from LAMOST LRS:
import json
from spectral_retrieval import SpectralRetriever
from predict_lrs_wclip_v0 import load_spectrum_data
# Configuration
with open('config_retrieval.json', 'r') as f:
config = json.load(f)
retriever = SpectralRetriever(**config)
# Load the external spectra data
wavelength, flux = load_spectrum_data('./test_data/lrs/sample1_matrix.fits')
# Predict corresponding Gaia XP spectrum
prediction_external = retriever.predict_cross_modal(
query_spectrum=(wavelength, flux),
query_type='lamost_spectra'
)
# Plot
retriever.plot_cross_modal_prediction(
prediction_external,
save_path='./plots/external_lamost_to_gaia_prediction.png'
)
Spectral Similarity Search
Find the top-4 most similar stars from Gaia XP catalog:
# Download test data only
!python download_and_setup.py --test-data-only
# Build embedding database from test data
retriever.build_embedding_database(batch_size=1000, save_path='./test_embeddings.npz')
# Load external LAMOST spectrum
wavelength, flux = load_spectrum_data('./test_data/lrs/sample1_matrix.fits')
# Find similar Gaia XP spectra
results_external_cross = retriever.find_similar_spectra(
query_spectrum=(wavelength, flux),
query_type='lamost_spectra',
search_type='cross_modal',
top_k=4
)
# Plot
retriever.plot_retrieval_results(
results_external_cross,
save_path='./plots/external_lamost_to_gaia_cross.png'
)
Parameter Prediction
Coming soon. This section will include examples of using SpecCLIP embeddings with downstream models (e.g., MLP, SBI) for stellar-parameter prediction.
π Full Documentation
To keep the Hugging Face card concise, all detailed instructions, including:
- Installation
- Parameter prediction
- Spectral translation
- Retrieval
- Full examples (Python + figures)
- Acknowledgments
are available at the GitHub repo:
π https://github.com/Xiaosheng-Zhao/SpecCLIP
π Citation
@ARTICLE{2025arXiv250701939Z,
author = {{Zhao}, Xiaosheng and {Huang}, Yang and {Xue}, Guirong and {Kong}, Xiao and
{Liu}, Jifeng and {Tang}, Xiaoyu and {Beers}, Timothy C. and
{Ting}, Yuan-Sen and {Luo}, A-Li},
title = "{SpecCLIP: Aligning and Translating Spectroscopic Measurements for Stars}",
journal = {arXiv e-prints},
keywords = {Instrumentation and Methods for Astrophysics, Solar and Stellar Astrophysics,
Artificial Intelligence, Machine Learning},
year = 2025,
month = jul,
eid = {arXiv:2507.01939},
pages = {arXiv:2507.01939},
doi = {10.48550/arXiv.250701939},
archivePrefix = {arXiv},
eprint = {2507.01939},
primaryClass = {astro-ph.IM},
}
π¬ Contact
- GitHub Issues: https://github.com/Xiaosheng-Zhao/SpecCLIP/issues
- Email: xzhao113@jh.edu