File size: 6,074 Bytes
7fa636b
 
 
 
 
9891700
1b36117
 
 
bce7856
1b36117
 
 
 
7fa636b
 
 
1b36117
 
0ac1cf8
1b36117
 
 
58ac2b4
1b36117
 
 
12ca4c7
 
 
 
 
 
 
 
1b36117
 
 
 
 
 
 
7fa636b
 
 
 
 
 
1b36117
 
 
 
 
7fa636b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1b36117
 
 
 
7fa636b
 
 
 
 
 
1b36117
 
 
 
 
 
 
 
 
 
2ca7b9c
 
 
 
 
 
 
 
 
 
 
 
7fa636b
2ca7b9c
 
 
1b36117
 
 
 
 
 
 
7fa636b
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
---
license: mit
pipeline_tag: feature-extraction
---

# 🌌 SpecCLIP: Aligning and Translating Spectroscopic Measurements for Stars

[![arXiv](https://img.shields.io/badge/arXiv-2507.01939-b31b1b.svg)](https://arxiv.org/abs/2507.01939)
[![GitHub](https://img.shields.io/badge/GitHub-Repo-black)](https://github.com/Xiaosheng-Zhao/SpecCLIP)
[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://github.com/Xiaosheng-Zhao/SpecCLIP/blob/main/LICENSE)

**SpecCLIP** is a contrastive + domain-preserving foundation model designed to align **LAMOST LRS** spectra with **Gaia XP** spectrophotometric data.
It learns a **general-purpose spectral embedding (768-dim)** that supports:

*   **Stellar parameter estimation**
*   **Cross-survey spectral translation** (LAMOST LRS ⟷ Gaia XP)
*   **Similarity retrieval** across LAMOST LRS and GAIA XP spectra

For full documentation, installation instructions, examples, and end-to-end usage, please visit the **GitHub repository**:
πŸ‘‰ [https://github.com/Xiaosheng-Zhao/SpecCLIP](https://github.com/Xiaosheng-Zhao/SpecCLIP)

---

## πŸ”§ Available Models

The following pretrained weights are included in this model repository:

| File                                         | Description                           | Embedding Dim | Param |
| -------------------------------------------- | ------------------------------------- | ------------- | ------|
| `encoders/lrs_encoder.ckpt`                  | LAMOST LRS masked transformer encoder | 768           |  43M  |
| `encoders/xp_encoder.ckpt`                   | Gaia XP masked transformer encoder    | 768           |  43M  |
| `encoders/xp_encoder_mlp.ckpt`               | Gaia XP autoencoder (MLP head)        | 768           |  43M  |
| `specclip/specclip_model_base.ckpt`          | Gaia XP  ⟷ LAMOST contrastive        | 768           |  100M |
| `specclip/specclip_model_predrecon_mlp.ckpt` | CLIP alignment + pred+recon           | 768           |  168M |
| `specclip/specclip_model_split_mlp.ckpt`     | CLIP alignment + split pred/recon     | 768           |  126M |

---

## 🧠 What the Model Does

SpecCLIP consists of:

*   **Two masked transformer encoders**
    – LAMOST LRS
    – Gaia XP
*   **Contrastive alignment loss (CLIP-style)**
*   **Domain-preserving prediction & reconstruction heads**
*   **Cross-modal decoder** for spectrum translation

It produces **shared embeddings** enabling multi-survey astrophysical analysis.

---

## Sample Usage
The following examples are adapted from the [official GitHub repository](https://github.com/Xiaosheng-Zhao/SpecCLIP).

### Installation

First, create a conda environment and install requirements:
```bash
conda create -n specclip-ai python=3.10
conda activate specclip-ai
conda install pytorch==2.5.1 torchvision==0.20.1 pytorch-cuda=11.8 -c pytorch -c nvidia
conda install numpy==2.0.1 scipy==1.15.3 pandas==2.3.3 mkl mkl-service -c defaults
pip install -r requirements.txt
pip install -e .
```

### Spectral Translation

Predict Gaia XP spectrum from LAMOST LRS:
```python
import json
from spectral_retrieval import SpectralRetriever
from predict_lrs_wclip_v0 import load_spectrum_data

# Configuration
with open('config_retrieval.json', 'r') as f:
    config = json.load(f)
retriever = SpectralRetriever(**config)

# Load the external spectra data
wavelength, flux = load_spectrum_data('./test_data/lrs/sample1_matrix.fits')

# Predict corresponding Gaia XP spectrum
prediction_external = retriever.predict_cross_modal(
    query_spectrum=(wavelength, flux),
    query_type='lamost_spectra'
)

# Plot
retriever.plot_cross_modal_prediction(
    prediction_external,
    save_path='./plots/external_lamost_to_gaia_prediction.png'
)
```

### Spectral Similarity Search

Find the top-4 most similar stars from Gaia XP catalog:
```python
# Download test data only
!python download_and_setup.py --test-data-only

# Build embedding database from test data
retriever.build_embedding_database(batch_size=1000, save_path='./test_embeddings.npz')

# Load external LAMOST spectrum
wavelength, flux = load_spectrum_data('./test_data/lrs/sample1_matrix.fits')

# Find similar Gaia XP spectra
results_external_cross = retriever.find_similar_spectra(
    query_spectrum=(wavelength, flux),
    query_type='lamost_spectra',
    search_type='cross_modal',
    top_k=4
)

# Plot
retriever.plot_retrieval_results(
    results_external_cross,
    save_path='./plots/external_lamost_to_gaia_cross.png'
)
```

### Parameter Prediction

**Coming soon.**
This section will include examples of using SpecCLIP embeddings with downstream models (e.g., MLP, SBI) for stellar-parameter prediction.

---

## πŸ“„ Full Documentation

To keep the Hugging Face card concise, **all detailed instructions**, including:

*   Installation
*   Parameter prediction
*   Spectral translation
*   Retrieval
*   Full examples (Python + figures)
*   Acknowledgments

are available at the GitHub repo:

πŸ‘‰ **[https://github.com/Xiaosheng-Zhao/SpecCLIP](https://github.com/Xiaosheng-Zhao/SpecCLIP)**

---

## πŸ“Š Citation

```bibtex
@ARTICLE{2025arXiv250701939Z,
       author = {{Zhao}, Xiaosheng and {Huang}, Yang and {Xue}, Guirong and {Kong}, Xiao and
                 {Liu}, Jifeng and {Tang}, Xiaoyu and {Beers}, Timothy C. and
                 {Ting}, Yuan-Sen and {Luo}, A-Li},
        title = "{SpecCLIP: Aligning and Translating Spectroscopic Measurements for Stars}",
      journal = {arXiv e-prints},
     keywords = {Instrumentation and Methods for Astrophysics, Solar and Stellar Astrophysics,
                 Artificial Intelligence, Machine Learning},
         year = 2025,
        month = jul,
          eid = {arXiv:2507.01939},
        pages = {arXiv:2507.01939},
          doi = {10.48550/arXiv.250701939},
archivePrefix = {arXiv},
       eprint = {2507.01939},
 primaryClass = {astro-ph.IM},
}
```

---

## πŸ“¬ Contact

*   GitHub Issues: [https://github.com/Xiaosheng-Zhao/SpecCLIP/issues](https://github.com/Xiaosheng-Zhao/SpecCLIP/issues)
*   Email: [xzhao113@jh.edu](mailto:xzhao113@jh.edu)