SpecCLIP / README.md
nielsr's picture
nielsr HF Staff
Add metadata (license, pipeline tag) and usage examples
7fa636b verified
|
raw
history blame
6.07 kB
---
license: mit
pipeline_tag: feature-extraction
---
# 🌌 SpecCLIP: Aligning and Translating Spectroscopic Measurements for Stars
[![arXiv](https://img.shields.io/badge/arXiv-2507.01939-b31b1b.svg)](https://arxiv.org/abs/2507.01939)
[![GitHub](https://img.shields.io/badge/GitHub-Repo-black)](https://github.com/Xiaosheng-Zhao/SpecCLIP)
[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://github.com/Xiaosheng-Zhao/SpecCLIP/blob/main/LICENSE)
**SpecCLIP** is a contrastive + domain-preserving foundation model designed to align **LAMOST LRS** spectra with **Gaia XP** spectrophotometric data.
It learns a **general-purpose spectral embedding (768-dim)** that supports:
* **Stellar parameter estimation**
* **Cross-survey spectral translation** (LAMOST LRS ⟷ Gaia XP)
* **Similarity retrieval** across LAMOST LRS and GAIA XP spectra
For full documentation, installation instructions, examples, and end-to-end usage, please visit the **GitHub repository**:
πŸ‘‰ [https://github.com/Xiaosheng-Zhao/SpecCLIP](https://github.com/Xiaosheng-Zhao/SpecCLIP)
---
## πŸ”§ Available Models
The following pretrained weights are included in this model repository:
| File | Description | Embedding Dim | Param |
| -------------------------------------------- | ------------------------------------- | ------------- | ------|
| `encoders/lrs_encoder.ckpt` | LAMOST LRS masked transformer encoder | 768 | 43M |
| `encoders/xp_encoder.ckpt` | Gaia XP masked transformer encoder | 768 | 43M |
| `encoders/xp_encoder_mlp.ckpt` | Gaia XP autoencoder (MLP head) | 768 | 43M |
| `specclip/specclip_model_base.ckpt` | Gaia XP ⟷ LAMOST contrastive | 768 | 100M |
| `specclip/specclip_model_predrecon_mlp.ckpt` | CLIP alignment + pred+recon | 768 | 168M |
| `specclip/specclip_model_split_mlp.ckpt` | CLIP alignment + split pred/recon | 768 | 126M |
---
## 🧠 What the Model Does
SpecCLIP consists of:
* **Two masked transformer encoders**
– LAMOST LRS
– Gaia XP
* **Contrastive alignment loss (CLIP-style)**
* **Domain-preserving prediction & reconstruction heads**
* **Cross-modal decoder** for spectrum translation
It produces **shared embeddings** enabling multi-survey astrophysical analysis.
---
## Sample Usage
The following examples are adapted from the [official GitHub repository](https://github.com/Xiaosheng-Zhao/SpecCLIP).
### Installation
First, create a conda environment and install requirements:
```bash
conda create -n specclip-ai python=3.10
conda activate specclip-ai
conda install pytorch==2.5.1 torchvision==0.20.1 pytorch-cuda=11.8 -c pytorch -c nvidia
conda install numpy==2.0.1 scipy==1.15.3 pandas==2.3.3 mkl mkl-service -c defaults
pip install -r requirements.txt
pip install -e .
```
### Spectral Translation
Predict Gaia XP spectrum from LAMOST LRS:
```python
import json
from spectral_retrieval import SpectralRetriever
from predict_lrs_wclip_v0 import load_spectrum_data
# Configuration
with open('config_retrieval.json', 'r') as f:
config = json.load(f)
retriever = SpectralRetriever(**config)
# Load the external spectra data
wavelength, flux = load_spectrum_data('./test_data/lrs/sample1_matrix.fits')
# Predict corresponding Gaia XP spectrum
prediction_external = retriever.predict_cross_modal(
query_spectrum=(wavelength, flux),
query_type='lamost_spectra'
)
# Plot
retriever.plot_cross_modal_prediction(
prediction_external,
save_path='./plots/external_lamost_to_gaia_prediction.png'
)
```
### Spectral Similarity Search
Find the top-4 most similar stars from Gaia XP catalog:
```python
# Download test data only
!python download_and_setup.py --test-data-only
# Build embedding database from test data
retriever.build_embedding_database(batch_size=1000, save_path='./test_embeddings.npz')
# Load external LAMOST spectrum
wavelength, flux = load_spectrum_data('./test_data/lrs/sample1_matrix.fits')
# Find similar Gaia XP spectra
results_external_cross = retriever.find_similar_spectra(
query_spectrum=(wavelength, flux),
query_type='lamost_spectra',
search_type='cross_modal',
top_k=4
)
# Plot
retriever.plot_retrieval_results(
results_external_cross,
save_path='./plots/external_lamost_to_gaia_cross.png'
)
```
### Parameter Prediction
**Coming soon.**
This section will include examples of using SpecCLIP embeddings with downstream models (e.g., MLP, SBI) for stellar-parameter prediction.
---
## πŸ“„ Full Documentation
To keep the Hugging Face card concise, **all detailed instructions**, including:
* Installation
* Parameter prediction
* Spectral translation
* Retrieval
* Full examples (Python + figures)
* Acknowledgments
are available at the GitHub repo:
πŸ‘‰ **[https://github.com/Xiaosheng-Zhao/SpecCLIP](https://github.com/Xiaosheng-Zhao/SpecCLIP)**
---
## πŸ“Š Citation
```bibtex
@ARTICLE{2025arXiv250701939Z,
author = {{Zhao}, Xiaosheng and {Huang}, Yang and {Xue}, Guirong and {Kong}, Xiao and
{Liu}, Jifeng and {Tang}, Xiaoyu and {Beers}, Timothy C. and
{Ting}, Yuan-Sen and {Luo}, A-Li},
title = "{SpecCLIP: Aligning and Translating Spectroscopic Measurements for Stars}",
journal = {arXiv e-prints},
keywords = {Instrumentation and Methods for Astrophysics, Solar and Stellar Astrophysics,
Artificial Intelligence, Machine Learning},
year = 2025,
month = jul,
eid = {arXiv:2507.01939},
pages = {arXiv:2507.01939},
doi = {10.48550/arXiv.250701939},
archivePrefix = {arXiv},
eprint = {2507.01939},
primaryClass = {astro-ph.IM},
}
```
---
## πŸ“¬ Contact
* GitHub Issues: [https://github.com/Xiaosheng-Zhao/SpecCLIP/issues](https://github.com/Xiaosheng-Zhao/SpecCLIP/issues)
* Email: [xzhao113@jh.edu](mailto:xzhao113@jh.edu)