AstroM3 (photo encoder)
HuggingFace: light-curve/astrom3
Paper
Rizhko, M. et al. (2024). AstroM³: A self-supervised multimodal model for astronomy. arXiv:2411.08842.
@article{rizhko2024astrom3,
author = {Rizhko, Mariia and Bloom, Joshua S.},
title = {{AstroM³}: A self-supervised multimodal model for astronomy},
journal = {arXiv preprint arXiv:2411.08842},
year = {2024}
}
Original code
https://github.com/MeriDK/AstroM3 (git submodule at models/astrom3/code/)
License
- Code (this repository): MIT — see LICENSE.
- Model weights (
AstroMLCore/AstroM3-CLIP-photo): Creative Commons Attribution 4.0 (CC BY 4.0).
Model overview
AstroM3 is a self-supervised multimodal contrastive model for variable-star classification that jointly trains photometry (light-curve), spectra, and metadata encoders using a CLIP-style objective. This integration exports the photo-only encoder from the pretrained CLIP checkpoint (AstroMLCore/AstroM3-CLIP-photo) as an ONNX embedding model.
The photo encoder is an Informer transformer (ProbSparse attention, 8 layers, d_model=128) trained on ZTF variable-star light curves from the MACC dataset. For ONNX export, the ProbSparse attention layers are replaced with standard scaled dot-product attention, which is equivalent in expectation and fully ONNX-exportable.
Inputs
| Tensor | Shape | Description |
|---|---|---|
x_enc |
[batch, 200, 9] |
Padded photometry features (9 channels per timestep — see preprocessing) |
mask |
[batch, 200] |
1 for valid timesteps, 0 for padding |
Outputs (ONNX)
Single file astrom3.onnx with two named outputs:
| Output | Shape | Aggregation |
|---|---|---|
mean |
[batch, 128] |
Masked mean pool of encoder outputs |
sequence |
[batch, 200, 128] |
Full per-timestep encoder outputs (unmasked) |
Preprocessing steps
The 9 input channels per timestep are built by preprocess_lc() in the
upstream dataset (AstroMLCore/AstroM3Dataset):
| Index | Feature | How obtained |
|---|---|---|
| 0 | time (HJD scaled to [0, 1]) |
per-observation |
| 1 | flux = (flux − mean) / MAD |
per-observation |
| 2 | flux_err = flux_err / MAD |
per-observation |
| 3 | amplitude |
ASAS-SN catalog scalar, replicated to every timestep |
| 4 | period |
ASAS-SN catalog scalar, replicated |
| 5 | lksl_statistic (Lafler-Kinman string length) |
ASAS-SN catalog scalar, replicated |
| 6 | rfr_score (Random Forest Regressor R² for phase-folded LC) |
ASAS-SN catalog scalar, replicated |
| 7 | log10(MAD_flux) |
global scalar computed from LC, replicated |
| 8 | delta_t = (max_HJD − min_HJD) / 365 |
global scalar computed from LC, replicated |
Features 3–6 come directly from the ASAS-SN v-band variable-star catalog (Jayasinghe et al. 2019) and are not recomputed from the light curve by this codebase. Users applying this model to non-ASAS-SN data must provide equivalent values (e.g. run a Lomb-Scargle period finder and compute peak-to-peak amplitude themselves).
Preprocessing recipe for a single light curve:
- Deduplicate and sort observations by HJD.
- Compute
meanandMADof the flux column; normalize flux and flux_err. - Scale HJD to [0, 1] over the span of the light curve.
- Compute
log10(MAD_flux)anddelta_t = (max_HJD − min_HJD) / 365. - Obtain
amplitude,period,lksl_statistic,rfr_scorefrom the ASAS-SN catalog (or compute equivalents). - Tile the 6 global scalars across all timesteps; concatenate with columns
0–2 to produce an
(N, 9)array. - Pad or center-crop to 200 timesteps; set
mask = 0for padded positions. - Use
float32for all tensors.
Weights
Source: https://huggingface.co/AstroMLCore/AstroM3-CLIP-photo
The model.safetensors file is a standalone Informer checkpoint (classification head present but unused; loaded with strict=False).
Dataset: ASAS-SN v-band variable-star light curves (AstroMLCore/AstroM3Processed).
Applying the model without ASAS-SN catalog features
Features 3–6 require the ASAS-SN catalog. For users applying the model to
other surveys, we measured the sensitivity of the mean embedding to each
feature being replaced. rfr_score was studied in detail.
rfr_score substitution
rfr_score is the R² of a Random Forest Regressor fit to the phase-folded
light curve; it quantifies period quality
(Jayasinghe et al. 2019, MNRAS 486 1907, §5; arXiv:1809.07329).
In the ASAS-SN test set it ranges from −3.5 to 1.18 (median ≈ 0.38).
Setting all timesteps to the constant 0.392 (the empirical optimum, equal to the dataset median) minimises mean cosine distance from the true-feature embeddings:
| Metric | Value |
|---|---|
| Overall mean cosine distance | 0.049 ± 0.091 |
| Macro-average per class | 0.049 ± 0.058 |
Per-class breakdown (5 samples per class from the ASAS-SN test split):
| Class | Mean dist | Std | True rfr mean |
|---|---|---|---|
| EW | 0.005 | 0.005 | −0.07 |
| SR | 0.004 | 0.003 | +0.50 |
| EA | 0.060 | 0.032 | +0.95 |
| RRAB | 0.020 | 0.011 | +0.83 |
| EB | 0.016 | 0.011 | +0.90 |
| ROT | 0.002 | 0.002 | +0.85 |
| RRC | 0.147 | 0.115 | −0.79 |
| HADS | 0.016 | 0.011 | +0.59 |
| M | 0.050 | 0.020 | +0.18 |
| DSCT | 0.170 | 0.182 | −0.86 |
Classes whose true rfr mean is far from 0.39 (RRC, DSCT) are most affected. Using an out-of-range value (e.g. ±100) causes cosine distances ~0.93–0.97, so staying within the training distribution is important.