|
|
--- |
|
|
license: mit |
|
|
pipeline_tag: feature-extraction |
|
|
--- |
|
|
|
|
|
# π SpecCLIP: Aligning and Translating Spectroscopic Measurements for Stars |
|
|
|
|
|
[](https://arxiv.org/abs/2507.01939) |
|
|
[](https://github.com/Xiaosheng-Zhao/SpecCLIP) |
|
|
[](https://github.com/Xiaosheng-Zhao/SpecCLIP/blob/main/LICENSE) |
|
|
|
|
|
**SpecCLIP** is a contrastive + domain-preserving foundation model designed to align **LAMOST LRS** spectra with **Gaia XP** spectrophotometric data. |
|
|
It learns a **general-purpose spectral embedding (768-dim)** that supports: |
|
|
|
|
|
* **Stellar parameter estimation** |
|
|
* **Cross-survey spectral translation** (LAMOST LRS β· Gaia XP) |
|
|
* **Similarity retrieval** across LAMOST LRS and GAIA XP spectra |
|
|
|
|
|
For full documentation, installation instructions, examples, and end-to-end usage, please visit the **GitHub repository**: |
|
|
π [https://github.com/Xiaosheng-Zhao/SpecCLIP](https://github.com/Xiaosheng-Zhao/SpecCLIP) |
|
|
|
|
|
--- |
|
|
|
|
|
## π§ Available Models |
|
|
|
|
|
The following pretrained weights are included in this model repository: |
|
|
|
|
|
| File | Description | Embedding Dim | Param | |
|
|
| -------------------------------------------- | ------------------------------------- | ------------- | ------| |
|
|
| `encoders/lrs_encoder.ckpt` | LAMOST LRS masked transformer encoder | 768 | 43M | |
|
|
| `encoders/xp_encoder.ckpt` | Gaia XP masked transformer encoder | 768 | 43M | |
|
|
| `encoders/xp_encoder_mlp.ckpt` | Gaia XP autoencoder (MLP head) | 768 | 43M | |
|
|
| `specclip/specclip_model_base.ckpt` | Gaia XP β· LAMOST contrastive | 768 | 100M | |
|
|
| `specclip/specclip_model_predrecon_mlp.ckpt` | CLIP alignment + pred+recon | 768 | 168M | |
|
|
| `specclip/specclip_model_split_mlp.ckpt` | CLIP alignment + split pred/recon | 768 | 126M | |
|
|
|
|
|
--- |
|
|
|
|
|
## π§ What the Model Does |
|
|
|
|
|
SpecCLIP consists of: |
|
|
|
|
|
* **Two masked transformer encoders** |
|
|
β LAMOST LRS |
|
|
β Gaia XP |
|
|
* **Contrastive alignment loss (CLIP-style)** |
|
|
* **Domain-preserving prediction & reconstruction heads** |
|
|
* **Cross-modal decoder** for spectrum translation |
|
|
|
|
|
It produces **shared embeddings** enabling multi-survey astrophysical analysis. |
|
|
|
|
|
--- |
|
|
|
|
|
## Sample Usage |
|
|
The following examples are adapted from the [official GitHub repository](https://github.com/Xiaosheng-Zhao/SpecCLIP). |
|
|
|
|
|
### Installation |
|
|
|
|
|
First, create a conda environment and install requirements: |
|
|
```bash |
|
|
conda create -n specclip-ai python=3.10 |
|
|
conda activate specclip-ai |
|
|
conda install pytorch==2.5.1 torchvision==0.20.1 pytorch-cuda=11.8 -c pytorch -c nvidia |
|
|
conda install numpy==2.0.1 scipy==1.15.3 pandas==2.3.3 mkl mkl-service -c defaults |
|
|
pip install -r requirements.txt |
|
|
pip install -e . |
|
|
``` |
|
|
|
|
|
### Spectral Translation |
|
|
|
|
|
Predict Gaia XP spectrum from LAMOST LRS: |
|
|
```python |
|
|
import json |
|
|
from spectral_retrieval import SpectralRetriever |
|
|
from predict_lrs_wclip_v0 import load_spectrum_data |
|
|
|
|
|
# Configuration |
|
|
with open('config_retrieval.json', 'r') as f: |
|
|
config = json.load(f) |
|
|
retriever = SpectralRetriever(**config) |
|
|
|
|
|
# Load the external spectra data |
|
|
wavelength, flux = load_spectrum_data('./test_data/lrs/sample1_matrix.fits') |
|
|
|
|
|
# Predict corresponding Gaia XP spectrum |
|
|
prediction_external = retriever.predict_cross_modal( |
|
|
query_spectrum=(wavelength, flux), |
|
|
query_type='lamost_spectra' |
|
|
) |
|
|
|
|
|
# Plot |
|
|
retriever.plot_cross_modal_prediction( |
|
|
prediction_external, |
|
|
save_path='./plots/external_lamost_to_gaia_prediction.png' |
|
|
) |
|
|
``` |
|
|
|
|
|
### Spectral Similarity Search |
|
|
|
|
|
Find the top-4 most similar stars from Gaia XP catalog: |
|
|
```python |
|
|
# Download test data only |
|
|
!python download_and_setup.py --test-data-only |
|
|
|
|
|
# Build embedding database from test data |
|
|
retriever.build_embedding_database(batch_size=1000, save_path='./test_embeddings.npz') |
|
|
|
|
|
# Load external LAMOST spectrum |
|
|
wavelength, flux = load_spectrum_data('./test_data/lrs/sample1_matrix.fits') |
|
|
|
|
|
# Find similar Gaia XP spectra |
|
|
results_external_cross = retriever.find_similar_spectra( |
|
|
query_spectrum=(wavelength, flux), |
|
|
query_type='lamost_spectra', |
|
|
search_type='cross_modal', |
|
|
top_k=4 |
|
|
) |
|
|
|
|
|
# Plot |
|
|
retriever.plot_retrieval_results( |
|
|
results_external_cross, |
|
|
save_path='./plots/external_lamost_to_gaia_cross.png' |
|
|
) |
|
|
``` |
|
|
|
|
|
### Parameter Prediction |
|
|
|
|
|
**Coming soon.** |
|
|
This section will include examples of using SpecCLIP embeddings with downstream models (e.g., MLP, SBI) for stellar-parameter prediction. |
|
|
|
|
|
--- |
|
|
|
|
|
## π Full Documentation |
|
|
|
|
|
To keep the Hugging Face card concise, **all detailed instructions**, including: |
|
|
|
|
|
* Installation |
|
|
* Parameter prediction |
|
|
* Spectral translation |
|
|
* Retrieval |
|
|
* Full examples (Python + figures) |
|
|
* Acknowledgments |
|
|
|
|
|
are available at the GitHub repo: |
|
|
|
|
|
π **[https://github.com/Xiaosheng-Zhao/SpecCLIP](https://github.com/Xiaosheng-Zhao/SpecCLIP)** |
|
|
|
|
|
--- |
|
|
|
|
|
## π Citation |
|
|
|
|
|
```bibtex |
|
|
@ARTICLE{2025arXiv250701939Z, |
|
|
author = {{Zhao}, Xiaosheng and {Huang}, Yang and {Xue}, Guirong and {Kong}, Xiao and |
|
|
{Liu}, Jifeng and {Tang}, Xiaoyu and {Beers}, Timothy C. and |
|
|
{Ting}, Yuan-Sen and {Luo}, A-Li}, |
|
|
title = "{SpecCLIP: Aligning and Translating Spectroscopic Measurements for Stars}", |
|
|
journal = {arXiv e-prints}, |
|
|
keywords = {Instrumentation and Methods for Astrophysics, Solar and Stellar Astrophysics, |
|
|
Artificial Intelligence, Machine Learning}, |
|
|
year = 2025, |
|
|
month = jul, |
|
|
eid = {arXiv:2507.01939}, |
|
|
pages = {arXiv:2507.01939}, |
|
|
doi = {10.48550/arXiv.250701939}, |
|
|
archivePrefix = {arXiv}, |
|
|
eprint = {2507.01939}, |
|
|
primaryClass = {astro-ph.IM}, |
|
|
} |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## π¬ Contact |
|
|
|
|
|
* GitHub Issues: [https://github.com/Xiaosheng-Zhao/SpecCLIP/issues](https://github.com/Xiaosheng-Zhao/SpecCLIP/issues) |
|
|
* Email: [xzhao113@jh.edu](mailto:xzhao113@jh.edu) |