nielsr HF Staff commited on
Commit
7fa636b
Β·
verified Β·
1 Parent(s): 2ca7b9c

Add metadata (license, pipeline tag) and usage examples

Browse files

This PR enhances the model card by:
- Adding the `license: mit` to the metadata, ensuring the model is properly licensed.
- Adding the `pipeline_tag: feature-extraction` to the metadata, aligning with the model's primary function of generating embeddings. This improves discoverability on the Hugging Face Hub.
- Integrating sample usage code snippets from the official GitHub repository into a new "Sample Usage" section. This provides users with immediate examples of how to install the necessary environment and interact with the model for spectral translation and similarity search, improving the model's usability directly from the Hub.

Please review and merge this PR if everything looks good.

Files changed (1) hide show
  1. README.md +102 -19
README.md CHANGED
@@ -1,3 +1,8 @@
 
 
 
 
 
1
  # 🌌 SpecCLIP: Aligning and Translating Spectroscopic Measurements for Stars
2
 
3
  [![arXiv](https://img.shields.io/badge/arXiv-2507.01939-b31b1b.svg)](https://arxiv.org/abs/2507.01939)
@@ -7,9 +12,9 @@
7
  **SpecCLIP** is a contrastive + domain-preserving foundation model designed to align **LAMOST LRS** spectra with **Gaia XP** spectrophotometric data.
8
  It learns a **general-purpose spectral embedding (768-dim)** that supports:
9
 
10
- * **Stellar parameter estimation**
11
- * **Cross-survey spectral translation** (LAMOST LRS ⟷ Gaia XP)
12
- * **Similarity retrieval** across LAMOST LRS and GAIA XP spectra
13
 
14
  For full documentation, installation instructions, examples, and end-to-end usage, please visit the **GitHub repository**:
15
  πŸ‘‰ [https://github.com/Xiaosheng-Zhao/SpecCLIP](https://github.com/Xiaosheng-Zhao/SpecCLIP)
@@ -35,27 +40,106 @@ The following pretrained weights are included in this model repository:
35
 
36
  SpecCLIP consists of:
37
 
38
- * **Two masked transformer encoders**
39
- – LAMOST LRS
40
- – Gaia XP
41
- * **Contrastive alignment loss (CLIP-style)**
42
- * **Domain-preserving prediction & reconstruction heads**
43
- * **Cross-modal decoder** for spectrum translation
44
 
45
  It produces **shared embeddings** enabling multi-survey astrophysical analysis.
46
 
47
  ---
48
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
49
  ## πŸ“„ Full Documentation
50
 
51
  To keep the Hugging Face card concise, **all detailed instructions**, including:
52
 
53
- * Installation
54
- * Parameter prediction
55
- * Spectral translation
56
- * Retrieval
57
- * Full examples (Python + figures)
58
- * Acknowledgments
59
 
60
  are available at the GitHub repo:
61
 
@@ -78,7 +162,7 @@ are available at the GitHub repo:
78
  month = jul,
79
  eid = {arXiv:2507.01939},
80
  pages = {arXiv:2507.01939},
81
- doi = {10.48550/arXiv.2507.01939},
82
  archivePrefix = {arXiv},
83
  eprint = {2507.01939},
84
  primaryClass = {astro-ph.IM},
@@ -89,6 +173,5 @@ archivePrefix = {arXiv},
89
 
90
  ## πŸ“¬ Contact
91
 
92
- * GitHub Issues: [https://github.com/Xiaosheng-Zhao/SpecCLIP/issues](https://github.com/Xiaosheng-Zhao/SpecCLIP/issues)
93
- * Email: [xzhao113@jh.edu](mailto:xzhao113@jh.edu)
94
-
 
1
+ ---
2
+ license: mit
3
+ pipeline_tag: feature-extraction
4
+ ---
5
+
6
  # 🌌 SpecCLIP: Aligning and Translating Spectroscopic Measurements for Stars
7
 
8
  [![arXiv](https://img.shields.io/badge/arXiv-2507.01939-b31b1b.svg)](https://arxiv.org/abs/2507.01939)
 
12
  **SpecCLIP** is a contrastive + domain-preserving foundation model designed to align **LAMOST LRS** spectra with **Gaia XP** spectrophotometric data.
13
  It learns a **general-purpose spectral embedding (768-dim)** that supports:
14
 
15
+ * **Stellar parameter estimation**
16
+ * **Cross-survey spectral translation** (LAMOST LRS ⟷ Gaia XP)
17
+ * **Similarity retrieval** across LAMOST LRS and GAIA XP spectra
18
 
19
  For full documentation, installation instructions, examples, and end-to-end usage, please visit the **GitHub repository**:
20
  πŸ‘‰ [https://github.com/Xiaosheng-Zhao/SpecCLIP](https://github.com/Xiaosheng-Zhao/SpecCLIP)
 
40
 
41
  SpecCLIP consists of:
42
 
43
+ * **Two masked transformer encoders**
44
+ – LAMOST LRS
45
+ – Gaia XP
46
+ * **Contrastive alignment loss (CLIP-style)**
47
+ * **Domain-preserving prediction & reconstruction heads**
48
+ * **Cross-modal decoder** for spectrum translation
49
 
50
  It produces **shared embeddings** enabling multi-survey astrophysical analysis.
51
 
52
  ---
53
 
54
+ ## Sample Usage
55
+ The following examples are adapted from the [official GitHub repository](https://github.com/Xiaosheng-Zhao/SpecCLIP).
56
+
57
+ ### Installation
58
+
59
+ First, create a conda environment and install requirements:
60
+ ```bash
61
+ conda create -n specclip-ai python=3.10
62
+ conda activate specclip-ai
63
+ conda install pytorch==2.5.1 torchvision==0.20.1 pytorch-cuda=11.8 -c pytorch -c nvidia
64
+ conda install numpy==2.0.1 scipy==1.15.3 pandas==2.3.3 mkl mkl-service -c defaults
65
+ pip install -r requirements.txt
66
+ pip install -e .
67
+ ```
68
+
69
+ ### Spectral Translation
70
+
71
+ Predict Gaia XP spectrum from LAMOST LRS:
72
+ ```python
73
+ import json
74
+ from spectral_retrieval import SpectralRetriever
75
+ from predict_lrs_wclip_v0 import load_spectrum_data
76
+
77
+ # Configuration
78
+ with open('config_retrieval.json', 'r') as f:
79
+ config = json.load(f)
80
+ retriever = SpectralRetriever(**config)
81
+
82
+ # Load the external spectra data
83
+ wavelength, flux = load_spectrum_data('./test_data/lrs/sample1_matrix.fits')
84
+
85
+ # Predict corresponding Gaia XP spectrum
86
+ prediction_external = retriever.predict_cross_modal(
87
+ query_spectrum=(wavelength, flux),
88
+ query_type='lamost_spectra'
89
+ )
90
+
91
+ # Plot
92
+ retriever.plot_cross_modal_prediction(
93
+ prediction_external,
94
+ save_path='./plots/external_lamost_to_gaia_prediction.png'
95
+ )
96
+ ```
97
+
98
+ ### Spectral Similarity Search
99
+
100
+ Find the top-4 most similar stars from Gaia XP catalog:
101
+ ```python
102
+ # Download test data only
103
+ !python download_and_setup.py --test-data-only
104
+
105
+ # Build embedding database from test data
106
+ retriever.build_embedding_database(batch_size=1000, save_path='./test_embeddings.npz')
107
+
108
+ # Load external LAMOST spectrum
109
+ wavelength, flux = load_spectrum_data('./test_data/lrs/sample1_matrix.fits')
110
+
111
+ # Find similar Gaia XP spectra
112
+ results_external_cross = retriever.find_similar_spectra(
113
+ query_spectrum=(wavelength, flux),
114
+ query_type='lamost_spectra',
115
+ search_type='cross_modal',
116
+ top_k=4
117
+ )
118
+
119
+ # Plot
120
+ retriever.plot_retrieval_results(
121
+ results_external_cross,
122
+ save_path='./plots/external_lamost_to_gaia_cross.png'
123
+ )
124
+ ```
125
+
126
+ ### Parameter Prediction
127
+
128
+ **Coming soon.**
129
+ This section will include examples of using SpecCLIP embeddings with downstream models (e.g., MLP, SBI) for stellar-parameter prediction.
130
+
131
+ ---
132
+
133
  ## πŸ“„ Full Documentation
134
 
135
  To keep the Hugging Face card concise, **all detailed instructions**, including:
136
 
137
+ * Installation
138
+ * Parameter prediction
139
+ * Spectral translation
140
+ * Retrieval
141
+ * Full examples (Python + figures)
142
+ * Acknowledgments
143
 
144
  are available at the GitHub repo:
145
 
 
162
  month = jul,
163
  eid = {arXiv:2507.01939},
164
  pages = {arXiv:2507.01939},
165
+ doi = {10.48550/arXiv.250701939},
166
  archivePrefix = {arXiv},
167
  eprint = {2507.01939},
168
  primaryClass = {astro-ph.IM},
 
173
 
174
  ## πŸ“¬ Contact
175
 
176
+ * GitHub Issues: [https://github.com/Xiaosheng-Zhao/SpecCLIP/issues](https://github.com/Xiaosheng-Zhao/SpecCLIP/issues)
177
+ * Email: [xzhao113@jh.edu](mailto:xzhao113@jh.edu)