File size: 6,074 Bytes
7fa636b 9891700 1b36117 bce7856 1b36117 7fa636b 1b36117 0ac1cf8 1b36117 58ac2b4 1b36117 12ca4c7 1b36117 7fa636b 1b36117 7fa636b 1b36117 7fa636b 1b36117 2ca7b9c 7fa636b 2ca7b9c 1b36117 7fa636b | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 | ---
license: mit
pipeline_tag: feature-extraction
---
# π SpecCLIP: Aligning and Translating Spectroscopic Measurements for Stars
[](https://arxiv.org/abs/2507.01939)
[](https://github.com/Xiaosheng-Zhao/SpecCLIP)
[](https://github.com/Xiaosheng-Zhao/SpecCLIP/blob/main/LICENSE)
**SpecCLIP** is a contrastive + domain-preserving foundation model designed to align **LAMOST LRS** spectra with **Gaia XP** spectrophotometric data.
It learns a **general-purpose spectral embedding (768-dim)** that supports:
* **Stellar parameter estimation**
* **Cross-survey spectral translation** (LAMOST LRS β· Gaia XP)
* **Similarity retrieval** across LAMOST LRS and GAIA XP spectra
For full documentation, installation instructions, examples, and end-to-end usage, please visit the **GitHub repository**:
π [https://github.com/Xiaosheng-Zhao/SpecCLIP](https://github.com/Xiaosheng-Zhao/SpecCLIP)
---
## π§ Available Models
The following pretrained weights are included in this model repository:
| File | Description | Embedding Dim | Param |
| -------------------------------------------- | ------------------------------------- | ------------- | ------|
| `encoders/lrs_encoder.ckpt` | LAMOST LRS masked transformer encoder | 768 | 43M |
| `encoders/xp_encoder.ckpt` | Gaia XP masked transformer encoder | 768 | 43M |
| `encoders/xp_encoder_mlp.ckpt` | Gaia XP autoencoder (MLP head) | 768 | 43M |
| `specclip/specclip_model_base.ckpt` | Gaia XP β· LAMOST contrastive | 768 | 100M |
| `specclip/specclip_model_predrecon_mlp.ckpt` | CLIP alignment + pred+recon | 768 | 168M |
| `specclip/specclip_model_split_mlp.ckpt` | CLIP alignment + split pred/recon | 768 | 126M |
---
## π§ What the Model Does
SpecCLIP consists of:
* **Two masked transformer encoders**
β LAMOST LRS
β Gaia XP
* **Contrastive alignment loss (CLIP-style)**
* **Domain-preserving prediction & reconstruction heads**
* **Cross-modal decoder** for spectrum translation
It produces **shared embeddings** enabling multi-survey astrophysical analysis.
---
## Sample Usage
The following examples are adapted from the [official GitHub repository](https://github.com/Xiaosheng-Zhao/SpecCLIP).
### Installation
First, create a conda environment and install requirements:
```bash
conda create -n specclip-ai python=3.10
conda activate specclip-ai
conda install pytorch==2.5.1 torchvision==0.20.1 pytorch-cuda=11.8 -c pytorch -c nvidia
conda install numpy==2.0.1 scipy==1.15.3 pandas==2.3.3 mkl mkl-service -c defaults
pip install -r requirements.txt
pip install -e .
```
### Spectral Translation
Predict Gaia XP spectrum from LAMOST LRS:
```python
import json
from spectral_retrieval import SpectralRetriever
from predict_lrs_wclip_v0 import load_spectrum_data
# Configuration
with open('config_retrieval.json', 'r') as f:
config = json.load(f)
retriever = SpectralRetriever(**config)
# Load the external spectra data
wavelength, flux = load_spectrum_data('./test_data/lrs/sample1_matrix.fits')
# Predict corresponding Gaia XP spectrum
prediction_external = retriever.predict_cross_modal(
query_spectrum=(wavelength, flux),
query_type='lamost_spectra'
)
# Plot
retriever.plot_cross_modal_prediction(
prediction_external,
save_path='./plots/external_lamost_to_gaia_prediction.png'
)
```
### Spectral Similarity Search
Find the top-4 most similar stars from Gaia XP catalog:
```python
# Download test data only
!python download_and_setup.py --test-data-only
# Build embedding database from test data
retriever.build_embedding_database(batch_size=1000, save_path='./test_embeddings.npz')
# Load external LAMOST spectrum
wavelength, flux = load_spectrum_data('./test_data/lrs/sample1_matrix.fits')
# Find similar Gaia XP spectra
results_external_cross = retriever.find_similar_spectra(
query_spectrum=(wavelength, flux),
query_type='lamost_spectra',
search_type='cross_modal',
top_k=4
)
# Plot
retriever.plot_retrieval_results(
results_external_cross,
save_path='./plots/external_lamost_to_gaia_cross.png'
)
```
### Parameter Prediction
**Coming soon.**
This section will include examples of using SpecCLIP embeddings with downstream models (e.g., MLP, SBI) for stellar-parameter prediction.
---
## π Full Documentation
To keep the Hugging Face card concise, **all detailed instructions**, including:
* Installation
* Parameter prediction
* Spectral translation
* Retrieval
* Full examples (Python + figures)
* Acknowledgments
are available at the GitHub repo:
π **[https://github.com/Xiaosheng-Zhao/SpecCLIP](https://github.com/Xiaosheng-Zhao/SpecCLIP)**
---
## π Citation
```bibtex
@ARTICLE{2025arXiv250701939Z,
author = {{Zhao}, Xiaosheng and {Huang}, Yang and {Xue}, Guirong and {Kong}, Xiao and
{Liu}, Jifeng and {Tang}, Xiaoyu and {Beers}, Timothy C. and
{Ting}, Yuan-Sen and {Luo}, A-Li},
title = "{SpecCLIP: Aligning and Translating Spectroscopic Measurements for Stars}",
journal = {arXiv e-prints},
keywords = {Instrumentation and Methods for Astrophysics, Solar and Stellar Astrophysics,
Artificial Intelligence, Machine Learning},
year = 2025,
month = jul,
eid = {arXiv:2507.01939},
pages = {arXiv:2507.01939},
doi = {10.48550/arXiv.250701939},
archivePrefix = {arXiv},
eprint = {2507.01939},
primaryClass = {astro-ph.IM},
}
```
---
## π¬ Contact
* GitHub Issues: [https://github.com/Xiaosheng-Zhao/SpecCLIP/issues](https://github.com/Xiaosheng-Zhao/SpecCLIP/issues)
* Email: [xzhao113@jh.edu](mailto:xzhao113@jh.edu) |