Add metadata (license, pipeline tag) and usage examples

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +102 -19
README.md CHANGED
@@ -1,3 +1,8 @@
 
 
 
 
 
1
  # 🌌 SpecCLIP: Aligning and Translating Spectroscopic Measurements for Stars
2
 
3
  [![arXiv](https://img.shields.io/badge/arXiv-2507.01939-b31b1b.svg)](https://arxiv.org/abs/2507.01939)
@@ -7,9 +12,9 @@
7
  **SpecCLIP** is a contrastive + domain-preserving foundation model designed to align **LAMOST LRS** spectra with **Gaia XP** spectrophotometric data.
8
  It learns a **general-purpose spectral embedding (768-dim)** that supports:
9
 
10
- * **Stellar parameter estimation**
11
- * **Cross-survey spectral translation** (LAMOST LRS ⟷ Gaia XP)
12
- * **Similarity retrieval** across LAMOST LRS and GAIA XP spectra
13
 
14
  For full documentation, installation instructions, examples, and end-to-end usage, please visit the **GitHub repository**:
15
  πŸ‘‰ [https://github.com/Xiaosheng-Zhao/SpecCLIP](https://github.com/Xiaosheng-Zhao/SpecCLIP)
@@ -35,27 +40,106 @@ The following pretrained weights are included in this model repository:
35
 
36
  SpecCLIP consists of:
37
 
38
- * **Two masked transformer encoders**
39
- – LAMOST LRS
40
- – Gaia XP
41
- * **Contrastive alignment loss (CLIP-style)**
42
- * **Domain-preserving prediction & reconstruction heads**
43
- * **Cross-modal decoder** for spectrum translation
44
 
45
  It produces **shared embeddings** enabling multi-survey astrophysical analysis.
46
 
47
  ---
48
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
49
  ## πŸ“„ Full Documentation
50
 
51
  To keep the Hugging Face card concise, **all detailed instructions**, including:
52
 
53
- * Installation
54
- * Parameter prediction
55
- * Spectral translation
56
- * Retrieval
57
- * Full examples (Python + figures)
58
- * Acknowledgments
59
 
60
  are available at the GitHub repo:
61
 
@@ -78,7 +162,7 @@ are available at the GitHub repo:
78
  month = jul,
79
  eid = {arXiv:2507.01939},
80
  pages = {arXiv:2507.01939},
81
- doi = {10.48550/arXiv.2507.01939},
82
  archivePrefix = {arXiv},
83
  eprint = {2507.01939},
84
  primaryClass = {astro-ph.IM},
@@ -89,6 +173,5 @@ archivePrefix = {arXiv},
89
 
90
  ## πŸ“¬ Contact
91
 
92
- * GitHub Issues: [https://github.com/Xiaosheng-Zhao/SpecCLIP/issues](https://github.com/Xiaosheng-Zhao/SpecCLIP/issues)
93
- * Email: [xzhao113@jh.edu](mailto:xzhao113@jh.edu)
94
-
 
1
+ ---
2
+ license: mit
3
+ pipeline_tag: feature-extraction
4
+ ---
5
+
6
  # 🌌 SpecCLIP: Aligning and Translating Spectroscopic Measurements for Stars
7
 
8
  [![arXiv](https://img.shields.io/badge/arXiv-2507.01939-b31b1b.svg)](https://arxiv.org/abs/2507.01939)
 
12
  **SpecCLIP** is a contrastive + domain-preserving foundation model designed to align **LAMOST LRS** spectra with **Gaia XP** spectrophotometric data.
13
  It learns a **general-purpose spectral embedding (768-dim)** that supports:
14
 
15
+ * **Stellar parameter estimation**
16
+ * **Cross-survey spectral translation** (LAMOST LRS ⟷ Gaia XP)
17
+ * **Similarity retrieval** across LAMOST LRS and GAIA XP spectra
18
 
19
  For full documentation, installation instructions, examples, and end-to-end usage, please visit the **GitHub repository**:
20
  πŸ‘‰ [https://github.com/Xiaosheng-Zhao/SpecCLIP](https://github.com/Xiaosheng-Zhao/SpecCLIP)
 
40
 
41
  SpecCLIP consists of:
42
 
43
+ * **Two masked transformer encoders**
44
+ – LAMOST LRS
45
+ – Gaia XP
46
+ * **Contrastive alignment loss (CLIP-style)**
47
+ * **Domain-preserving prediction & reconstruction heads**
48
+ * **Cross-modal decoder** for spectrum translation
49
 
50
  It produces **shared embeddings** enabling multi-survey astrophysical analysis.
51
 
52
  ---
53
 
54
+ ## Sample Usage
55
+ The following examples are adapted from the [official GitHub repository](https://github.com/Xiaosheng-Zhao/SpecCLIP).
56
+
57
+ ### Installation
58
+
59
+ First, create a conda environment and install requirements:
60
+ ```bash
61
+ conda create -n specclip-ai python=3.10
62
+ conda activate specclip-ai
63
+ conda install pytorch==2.5.1 torchvision==0.20.1 pytorch-cuda=11.8 -c pytorch -c nvidia
64
+ conda install numpy==2.0.1 scipy==1.15.3 pandas==2.3.3 mkl mkl-service -c defaults
65
+ pip install -r requirements.txt
66
+ pip install -e .
67
+ ```
68
+
69
+ ### Spectral Translation
70
+
71
+ Predict Gaia XP spectrum from LAMOST LRS:
72
+ ```python
73
+ import json
74
+ from spectral_retrieval import SpectralRetriever
75
+ from predict_lrs_wclip_v0 import load_spectrum_data
76
+
77
+ # Configuration
78
+ with open('config_retrieval.json', 'r') as f:
79
+ config = json.load(f)
80
+ retriever = SpectralRetriever(**config)
81
+
82
+ # Load the external spectra data
83
+ wavelength, flux = load_spectrum_data('./test_data/lrs/sample1_matrix.fits')
84
+
85
+ # Predict corresponding Gaia XP spectrum
86
+ prediction_external = retriever.predict_cross_modal(
87
+ query_spectrum=(wavelength, flux),
88
+ query_type='lamost_spectra'
89
+ )
90
+
91
+ # Plot
92
+ retriever.plot_cross_modal_prediction(
93
+ prediction_external,
94
+ save_path='./plots/external_lamost_to_gaia_prediction.png'
95
+ )
96
+ ```
97
+
98
+ ### Spectral Similarity Search
99
+
100
+ Find the top-4 most similar stars from Gaia XP catalog:
101
+ ```python
102
+ # Download test data only
103
+ !python download_and_setup.py --test-data-only
104
+
105
+ # Build embedding database from test data
106
+ retriever.build_embedding_database(batch_size=1000, save_path='./test_embeddings.npz')
107
+
108
+ # Load external LAMOST spectrum
109
+ wavelength, flux = load_spectrum_data('./test_data/lrs/sample1_matrix.fits')
110
+
111
+ # Find similar Gaia XP spectra
112
+ results_external_cross = retriever.find_similar_spectra(
113
+ query_spectrum=(wavelength, flux),
114
+ query_type='lamost_spectra',
115
+ search_type='cross_modal',
116
+ top_k=4
117
+ )
118
+
119
+ # Plot
120
+ retriever.plot_retrieval_results(
121
+ results_external_cross,
122
+ save_path='./plots/external_lamost_to_gaia_cross.png'
123
+ )
124
+ ```
125
+
126
+ ### Parameter Prediction
127
+
128
+ **Coming soon.**
129
+ This section will include examples of using SpecCLIP embeddings with downstream models (e.g., MLP, SBI) for stellar-parameter prediction.
130
+
131
+ ---
132
+
133
  ## πŸ“„ Full Documentation
134
 
135
  To keep the Hugging Face card concise, **all detailed instructions**, including:
136
 
137
+ * Installation
138
+ * Parameter prediction
139
+ * Spectral translation
140
+ * Retrieval
141
+ * Full examples (Python + figures)
142
+ * Acknowledgments
143
 
144
  are available at the GitHub repo:
145
 
 
162
  month = jul,
163
  eid = {arXiv:2507.01939},
164
  pages = {arXiv:2507.01939},
165
+ doi = {10.48550/arXiv.250701939},
166
  archivePrefix = {arXiv},
167
  eprint = {2507.01939},
168
  primaryClass = {astro-ph.IM},
 
173
 
174
  ## πŸ“¬ Contact
175
 
176
+ * GitHub Issues: [https://github.com/Xiaosheng-Zhao/SpecCLIP/issues](https://github.com/Xiaosheng-Zhao/SpecCLIP/issues)
177
+ * Email: [xzhao113@jh.edu](mailto:xzhao113@jh.edu)