AstroM3 (photo encoder)

HuggingFace: light-curve/astrom3

Paper

Rizhko, M. et al. (2024). AstroM³: A self-supervised multimodal model for astronomy. arXiv:2411.08842.

@article{rizhko2024astrom3,
  author = {Rizhko, Mariia and Bloom, Joshua S.},
  title = {{AstroM³}: A self-supervised multimodal model for astronomy},
  journal = {arXiv preprint arXiv:2411.08842},
  year = {2024}
}

Original code

https://github.com/MeriDK/AstroM3 (git submodule at models/astrom3/code/)

License

  • Code (this repository): MIT — see LICENSE.
  • Model weights (AstroMLCore/AstroM3-CLIP-photo): Creative Commons Attribution 4.0 (CC BY 4.0).

Model overview

AstroM3 is a self-supervised multimodal contrastive model for variable-star classification that jointly trains photometry (light-curve), spectra, and metadata encoders using a CLIP-style objective. This integration exports the photo-only encoder from the pretrained CLIP checkpoint (AstroMLCore/AstroM3-CLIP-photo) as an ONNX embedding model.

The photo encoder is an Informer transformer (ProbSparse attention, 8 layers, d_model=128) trained on ZTF variable-star light curves from the MACC dataset. For ONNX export, the ProbSparse attention layers are replaced with standard scaled dot-product attention, which is equivalent in expectation and fully ONNX-exportable.

Inputs

Tensor Shape Description
x_enc [batch, 200, 9] Padded photometry features (9 channels per timestep — see preprocessing)
mask [batch, 200] 1 for valid timesteps, 0 for padding

Outputs (ONNX)

Single file astrom3.onnx with two named outputs:

Output Shape Aggregation
mean [batch, 128] Masked mean pool of encoder outputs
sequence [batch, 200, 128] Full per-timestep encoder outputs (unmasked)

Preprocessing steps

The 9 input channels per timestep are built by preprocess_lc() in the upstream dataset (AstroMLCore/AstroM3Dataset):

Index Feature How obtained
0 time (HJD scaled to [0, 1]) per-observation
1 flux = (flux − mean) / MAD per-observation
2 flux_err = flux_err / MAD per-observation
3 amplitude ASAS-SN catalog scalar, replicated to every timestep
4 period ASAS-SN catalog scalar, replicated
5 lksl_statistic (Lafler-Kinman string length) ASAS-SN catalog scalar, replicated
6 rfr_score (Random Forest Regressor R² for phase-folded LC) ASAS-SN catalog scalar, replicated
7 log10(MAD_flux) global scalar computed from LC, replicated
8 delta_t = (max_HJD − min_HJD) / 365 global scalar computed from LC, replicated

Features 3–6 come directly from the ASAS-SN v-band variable-star catalog (Jayasinghe et al. 2019) and are not recomputed from the light curve by this codebase. Users applying this model to non-ASAS-SN data must provide equivalent values (e.g. run a Lomb-Scargle period finder and compute peak-to-peak amplitude themselves).

Preprocessing recipe for a single light curve:

  1. Deduplicate and sort observations by HJD.
  2. Compute mean and MAD of the flux column; normalize flux and flux_err.
  3. Scale HJD to [0, 1] over the span of the light curve.
  4. Compute log10(MAD_flux) and delta_t = (max_HJD − min_HJD) / 365.
  5. Obtain amplitude, period, lksl_statistic, rfr_score from the ASAS-SN catalog (or compute equivalents).
  6. Tile the 6 global scalars across all timesteps; concatenate with columns 0–2 to produce an (N, 9) array.
  7. Pad or center-crop to 200 timesteps; set mask = 0 for padded positions.
  8. Use float32 for all tensors.

Weights

Source: https://huggingface.co/AstroMLCore/AstroM3-CLIP-photo

The model.safetensors file is a standalone Informer checkpoint (classification head present but unused; loaded with strict=False).

Dataset: ASAS-SN v-band variable-star light curves (AstroMLCore/AstroM3Processed).

Applying the model without ASAS-SN catalog features

Features 3–6 require the ASAS-SN catalog. For users applying the model to other surveys, we measured the sensitivity of the mean embedding to each feature being replaced. rfr_score was studied in detail.

rfr_score substitution

rfr_score is the R² of a Random Forest Regressor fit to the phase-folded light curve; it quantifies period quality (Jayasinghe et al. 2019, MNRAS 486 1907, §5; arXiv:1809.07329). In the ASAS-SN test set it ranges from −3.5 to 1.18 (median ≈ 0.38).

Setting all timesteps to the constant 0.392 (the empirical optimum, equal to the dataset median) minimises mean cosine distance from the true-feature embeddings:

Metric Value
Overall mean cosine distance 0.049 ± 0.091
Macro-average per class 0.049 ± 0.058

Per-class breakdown (5 samples per class from the ASAS-SN test split):

Class Mean dist Std True rfr mean
EW 0.005 0.005 −0.07
SR 0.004 0.003 +0.50
EA 0.060 0.032 +0.95
RRAB 0.020 0.011 +0.83
EB 0.016 0.011 +0.90
ROT 0.002 0.002 +0.85
RRC 0.147 0.115 −0.79
HADS 0.016 0.011 +0.59
M 0.050 0.020 +0.18
DSCT 0.170 0.182 −0.86

Classes whose true rfr mean is far from 0.39 (RRC, DSCT) are most affected. Using an out-of-range value (e.g. ±100) causes cosine distances ~0.93–0.97, so staying within the training distribution is important.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Papers for light-curve/astrom3