--- license: mit pipeline_tag: feature-extraction --- # 🌌 SpecCLIP: Aligning and Translating Spectroscopic Measurements for Stars [![arXiv](https://img.shields.io/badge/arXiv-2507.01939-b31b1b.svg)](https://arxiv.org/abs/2507.01939) [![GitHub](https://img.shields.io/badge/GitHub-Repo-black)](https://github.com/Xiaosheng-Zhao/SpecCLIP) [![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://github.com/Xiaosheng-Zhao/SpecCLIP/blob/main/LICENSE) **SpecCLIP** is a contrastive + domain-preserving foundation model designed to align **LAMOST LRS** spectra with **Gaia XP** spectrophotometric data. It learns a **general-purpose spectral embedding (768-dim)** that supports: * **Stellar parameter estimation** * **Cross-survey spectral translation** (LAMOST LRS ⟷ Gaia XP) * **Similarity retrieval** across LAMOST LRS and GAIA XP spectra For full documentation, installation instructions, examples, and end-to-end usage, please visit the **GitHub repository**: 👉 [https://github.com/Xiaosheng-Zhao/SpecCLIP](https://github.com/Xiaosheng-Zhao/SpecCLIP) --- ## 🔧 Available Models The following pretrained weights are included in this model repository: | File | Description | Embedding Dim | Param | | -------------------------------------------- | ------------------------------------- | ------------- | ------| | `encoders/lrs_encoder.ckpt` | LAMOST LRS masked transformer encoder | 768 | 43M | | `encoders/xp_encoder.ckpt` | Gaia XP masked transformer encoder | 768 | 43M | | `encoders/xp_encoder_mlp.ckpt` | Gaia XP autoencoder (MLP head) | 768 | 43M | | `specclip/specclip_model_base.ckpt` | Gaia XP ⟷ LAMOST contrastive | 768 | 100M | | `specclip/specclip_model_predrecon_mlp.ckpt` | CLIP alignment + pred+recon | 768 | 168M | | `specclip/specclip_model_split_mlp.ckpt` | CLIP alignment + split pred/recon | 768 | 126M | --- ## 🧠 What the Model Does SpecCLIP consists of: * **Two masked transformer encoders** – LAMOST LRS – Gaia XP * **Contrastive alignment loss (CLIP-style)** * **Domain-preserving prediction & reconstruction heads** * **Cross-modal decoder** for spectrum translation It produces **shared embeddings** enabling multi-survey astrophysical analysis. --- ## Sample Usage The following examples are adapted from the [official GitHub repository](https://github.com/Xiaosheng-Zhao/SpecCLIP). ### Installation First, create a conda environment and install requirements: ```bash conda create -n specclip-ai python=3.10 conda activate specclip-ai conda install pytorch==2.5.1 torchvision==0.20.1 pytorch-cuda=11.8 -c pytorch -c nvidia conda install numpy==2.0.1 scipy==1.15.3 pandas==2.3.3 mkl mkl-service -c defaults pip install -r requirements.txt pip install -e . ``` ### Spectral Translation Predict Gaia XP spectrum from LAMOST LRS: ```python import json from spectral_retrieval import SpectralRetriever from predict_lrs_wclip_v0 import load_spectrum_data # Configuration with open('config_retrieval.json', 'r') as f: config = json.load(f) retriever = SpectralRetriever(**config) # Load the external spectra data wavelength, flux = load_spectrum_data('./test_data/lrs/sample1_matrix.fits') # Predict corresponding Gaia XP spectrum prediction_external = retriever.predict_cross_modal( query_spectrum=(wavelength, flux), query_type='lamost_spectra' ) # Plot retriever.plot_cross_modal_prediction( prediction_external, save_path='./plots/external_lamost_to_gaia_prediction.png' ) ``` ### Spectral Similarity Search Find the top-4 most similar stars from Gaia XP catalog: ```python # Download test data only !python download_and_setup.py --test-data-only # Build embedding database from test data retriever.build_embedding_database(batch_size=1000, save_path='./test_embeddings.npz') # Load external LAMOST spectrum wavelength, flux = load_spectrum_data('./test_data/lrs/sample1_matrix.fits') # Find similar Gaia XP spectra results_external_cross = retriever.find_similar_spectra( query_spectrum=(wavelength, flux), query_type='lamost_spectra', search_type='cross_modal', top_k=4 ) # Plot retriever.plot_retrieval_results( results_external_cross, save_path='./plots/external_lamost_to_gaia_cross.png' ) ``` ### Parameter Prediction **Coming soon.** This section will include examples of using SpecCLIP embeddings with downstream models (e.g., MLP, SBI) for stellar-parameter prediction. --- ## 📄 Full Documentation To keep the Hugging Face card concise, **all detailed instructions**, including: * Installation * Parameter prediction * Spectral translation * Retrieval * Full examples (Python + figures) * Acknowledgments are available at the GitHub repo: 👉 **[https://github.com/Xiaosheng-Zhao/SpecCLIP](https://github.com/Xiaosheng-Zhao/SpecCLIP)** --- ## 📊 Citation ```bibtex @ARTICLE{2025arXiv250701939Z, author = {{Zhao}, Xiaosheng and {Huang}, Yang and {Xue}, Guirong and {Kong}, Xiao and {Liu}, Jifeng and {Tang}, Xiaoyu and {Beers}, Timothy C. and {Ting}, Yuan-Sen and {Luo}, A-Li}, title = "{SpecCLIP: Aligning and Translating Spectroscopic Measurements for Stars}", journal = {arXiv e-prints}, keywords = {Instrumentation and Methods for Astrophysics, Solar and Stellar Astrophysics, Artificial Intelligence, Machine Learning}, year = 2025, month = jul, eid = {arXiv:2507.01939}, pages = {arXiv:2507.01939}, doi = {10.48550/arXiv.250701939}, archivePrefix = {arXiv}, eprint = {2507.01939}, primaryClass = {astro-ph.IM}, } ``` --- ## 📬 Contact * GitHub Issues: [https://github.com/Xiaosheng-Zhao/SpecCLIP/issues](https://github.com/Xiaosheng-Zhao/SpecCLIP/issues) * Email: [xzhao113@jh.edu](mailto:xzhao113@jh.edu)