Add metadata (license, pipeline tag) and usage examples
Browse filesThis PR enhances the model card by:
- Adding the `license: mit` to the metadata, ensuring the model is properly licensed.
- Adding the `pipeline_tag: feature-extraction` to the metadata, aligning with the model's primary function of generating embeddings. This improves discoverability on the Hugging Face Hub.
- Integrating sample usage code snippets from the official GitHub repository into a new "Sample Usage" section. This provides users with immediate examples of how to install the necessary environment and interact with the model for spectral translation and similarity search, improving the model's usability directly from the Hub.
Please review and merge this PR if everything looks good.
README.md
CHANGED
|
@@ -1,3 +1,8 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
# π SpecCLIP: Aligning and Translating Spectroscopic Measurements for Stars
|
| 2 |
|
| 3 |
[](https://arxiv.org/abs/2507.01939)
|
|
@@ -7,9 +12,9 @@
|
|
| 7 |
**SpecCLIP** is a contrastive + domain-preserving foundation model designed to align **LAMOST LRS** spectra with **Gaia XP** spectrophotometric data.
|
| 8 |
It learns a **general-purpose spectral embedding (768-dim)** that supports:
|
| 9 |
|
| 10 |
-
*
|
| 11 |
-
*
|
| 12 |
-
*
|
| 13 |
|
| 14 |
For full documentation, installation instructions, examples, and end-to-end usage, please visit the **GitHub repository**:
|
| 15 |
π [https://github.com/Xiaosheng-Zhao/SpecCLIP](https://github.com/Xiaosheng-Zhao/SpecCLIP)
|
|
@@ -35,27 +40,106 @@ The following pretrained weights are included in this model repository:
|
|
| 35 |
|
| 36 |
SpecCLIP consists of:
|
| 37 |
|
| 38 |
-
*
|
| 39 |
-
|
| 40 |
-
|
| 41 |
-
*
|
| 42 |
-
*
|
| 43 |
-
*
|
| 44 |
|
| 45 |
It produces **shared embeddings** enabling multi-survey astrophysical analysis.
|
| 46 |
|
| 47 |
---
|
| 48 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 49 |
## π Full Documentation
|
| 50 |
|
| 51 |
To keep the Hugging Face card concise, **all detailed instructions**, including:
|
| 52 |
|
| 53 |
-
*
|
| 54 |
-
*
|
| 55 |
-
*
|
| 56 |
-
*
|
| 57 |
-
*
|
| 58 |
-
*
|
| 59 |
|
| 60 |
are available at the GitHub repo:
|
| 61 |
|
|
@@ -78,7 +162,7 @@ are available at the GitHub repo:
|
|
| 78 |
month = jul,
|
| 79 |
eid = {arXiv:2507.01939},
|
| 80 |
pages = {arXiv:2507.01939},
|
| 81 |
-
doi = {10.48550/arXiv.
|
| 82 |
archivePrefix = {arXiv},
|
| 83 |
eprint = {2507.01939},
|
| 84 |
primaryClass = {astro-ph.IM},
|
|
@@ -89,6 +173,5 @@ archivePrefix = {arXiv},
|
|
| 89 |
|
| 90 |
## π¬ Contact
|
| 91 |
|
| 92 |
-
*
|
| 93 |
-
*
|
| 94 |
-
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
pipeline_tag: feature-extraction
|
| 4 |
+
---
|
| 5 |
+
|
| 6 |
# π SpecCLIP: Aligning and Translating Spectroscopic Measurements for Stars
|
| 7 |
|
| 8 |
[](https://arxiv.org/abs/2507.01939)
|
|
|
|
| 12 |
**SpecCLIP** is a contrastive + domain-preserving foundation model designed to align **LAMOST LRS** spectra with **Gaia XP** spectrophotometric data.
|
| 13 |
It learns a **general-purpose spectral embedding (768-dim)** that supports:
|
| 14 |
|
| 15 |
+
* **Stellar parameter estimation**
|
| 16 |
+
* **Cross-survey spectral translation** (LAMOST LRS β· Gaia XP)
|
| 17 |
+
* **Similarity retrieval** across LAMOST LRS and GAIA XP spectra
|
| 18 |
|
| 19 |
For full documentation, installation instructions, examples, and end-to-end usage, please visit the **GitHub repository**:
|
| 20 |
π [https://github.com/Xiaosheng-Zhao/SpecCLIP](https://github.com/Xiaosheng-Zhao/SpecCLIP)
|
|
|
|
| 40 |
|
| 41 |
SpecCLIP consists of:
|
| 42 |
|
| 43 |
+
* **Two masked transformer encoders**
|
| 44 |
+
β LAMOST LRS
|
| 45 |
+
β Gaia XP
|
| 46 |
+
* **Contrastive alignment loss (CLIP-style)**
|
| 47 |
+
* **Domain-preserving prediction & reconstruction heads**
|
| 48 |
+
* **Cross-modal decoder** for spectrum translation
|
| 49 |
|
| 50 |
It produces **shared embeddings** enabling multi-survey astrophysical analysis.
|
| 51 |
|
| 52 |
---
|
| 53 |
|
| 54 |
+
## Sample Usage
|
| 55 |
+
The following examples are adapted from the [official GitHub repository](https://github.com/Xiaosheng-Zhao/SpecCLIP).
|
| 56 |
+
|
| 57 |
+
### Installation
|
| 58 |
+
|
| 59 |
+
First, create a conda environment and install requirements:
|
| 60 |
+
```bash
|
| 61 |
+
conda create -n specclip-ai python=3.10
|
| 62 |
+
conda activate specclip-ai
|
| 63 |
+
conda install pytorch==2.5.1 torchvision==0.20.1 pytorch-cuda=11.8 -c pytorch -c nvidia
|
| 64 |
+
conda install numpy==2.0.1 scipy==1.15.3 pandas==2.3.3 mkl mkl-service -c defaults
|
| 65 |
+
pip install -r requirements.txt
|
| 66 |
+
pip install -e .
|
| 67 |
+
```
|
| 68 |
+
|
| 69 |
+
### Spectral Translation
|
| 70 |
+
|
| 71 |
+
Predict Gaia XP spectrum from LAMOST LRS:
|
| 72 |
+
```python
|
| 73 |
+
import json
|
| 74 |
+
from spectral_retrieval import SpectralRetriever
|
| 75 |
+
from predict_lrs_wclip_v0 import load_spectrum_data
|
| 76 |
+
|
| 77 |
+
# Configuration
|
| 78 |
+
with open('config_retrieval.json', 'r') as f:
|
| 79 |
+
config = json.load(f)
|
| 80 |
+
retriever = SpectralRetriever(**config)
|
| 81 |
+
|
| 82 |
+
# Load the external spectra data
|
| 83 |
+
wavelength, flux = load_spectrum_data('./test_data/lrs/sample1_matrix.fits')
|
| 84 |
+
|
| 85 |
+
# Predict corresponding Gaia XP spectrum
|
| 86 |
+
prediction_external = retriever.predict_cross_modal(
|
| 87 |
+
query_spectrum=(wavelength, flux),
|
| 88 |
+
query_type='lamost_spectra'
|
| 89 |
+
)
|
| 90 |
+
|
| 91 |
+
# Plot
|
| 92 |
+
retriever.plot_cross_modal_prediction(
|
| 93 |
+
prediction_external,
|
| 94 |
+
save_path='./plots/external_lamost_to_gaia_prediction.png'
|
| 95 |
+
)
|
| 96 |
+
```
|
| 97 |
+
|
| 98 |
+
### Spectral Similarity Search
|
| 99 |
+
|
| 100 |
+
Find the top-4 most similar stars from Gaia XP catalog:
|
| 101 |
+
```python
|
| 102 |
+
# Download test data only
|
| 103 |
+
!python download_and_setup.py --test-data-only
|
| 104 |
+
|
| 105 |
+
# Build embedding database from test data
|
| 106 |
+
retriever.build_embedding_database(batch_size=1000, save_path='./test_embeddings.npz')
|
| 107 |
+
|
| 108 |
+
# Load external LAMOST spectrum
|
| 109 |
+
wavelength, flux = load_spectrum_data('./test_data/lrs/sample1_matrix.fits')
|
| 110 |
+
|
| 111 |
+
# Find similar Gaia XP spectra
|
| 112 |
+
results_external_cross = retriever.find_similar_spectra(
|
| 113 |
+
query_spectrum=(wavelength, flux),
|
| 114 |
+
query_type='lamost_spectra',
|
| 115 |
+
search_type='cross_modal',
|
| 116 |
+
top_k=4
|
| 117 |
+
)
|
| 118 |
+
|
| 119 |
+
# Plot
|
| 120 |
+
retriever.plot_retrieval_results(
|
| 121 |
+
results_external_cross,
|
| 122 |
+
save_path='./plots/external_lamost_to_gaia_cross.png'
|
| 123 |
+
)
|
| 124 |
+
```
|
| 125 |
+
|
| 126 |
+
### Parameter Prediction
|
| 127 |
+
|
| 128 |
+
**Coming soon.**
|
| 129 |
+
This section will include examples of using SpecCLIP embeddings with downstream models (e.g., MLP, SBI) for stellar-parameter prediction.
|
| 130 |
+
|
| 131 |
+
---
|
| 132 |
+
|
| 133 |
## π Full Documentation
|
| 134 |
|
| 135 |
To keep the Hugging Face card concise, **all detailed instructions**, including:
|
| 136 |
|
| 137 |
+
* Installation
|
| 138 |
+
* Parameter prediction
|
| 139 |
+
* Spectral translation
|
| 140 |
+
* Retrieval
|
| 141 |
+
* Full examples (Python + figures)
|
| 142 |
+
* Acknowledgments
|
| 143 |
|
| 144 |
are available at the GitHub repo:
|
| 145 |
|
|
|
|
| 162 |
month = jul,
|
| 163 |
eid = {arXiv:2507.01939},
|
| 164 |
pages = {arXiv:2507.01939},
|
| 165 |
+
doi = {10.48550/arXiv.250701939},
|
| 166 |
archivePrefix = {arXiv},
|
| 167 |
eprint = {2507.01939},
|
| 168 |
primaryClass = {astro-ph.IM},
|
|
|
|
| 173 |
|
| 174 |
## π¬ Contact
|
| 175 |
|
| 176 |
+
* GitHub Issues: [https://github.com/Xiaosheng-Zhao/SpecCLIP/issues](https://github.com/Xiaosheng-Zhao/SpecCLIP/issues)
|
| 177 |
+
* Email: [xzhao113@jh.edu](mailto:xzhao113@jh.edu)
|
|
|