AstroM3 (photo encoder)

Paper

Rizhko, M. et al. (2024). AstroM³: A self-supervised multimodal model for astronomy. arXiv:2411.08842.

@article{rizhko2024astrom3,
  author = {Rizhko, Mariia and Bloom, Joshua S.},
  title = {{AstroM³}: A self-supervised multimodal model for astronomy},
  journal = {arXiv preprint arXiv:2411.08842},
  year = {2024}
}

Original code

https://github.com/MeriDK/AstroM3 (git submodule at models/astrom3/code/)

License

Code (this repository): MIT — see LICENSE.
Model weights (AstroMLCore/AstroM3-CLIP-photo): Creative Commons Attribution 4.0 (CC BY 4.0).

Model overview

AstroM3 is a self-supervised multimodal contrastive model for variable-star classification that jointly trains photometry (light-curve), spectra, and metadata encoders using a CLIP-style objective. This integration exports the photo-only encoder from the pretrained CLIP checkpoint (AstroMLCore/AstroM3-CLIP-photo) as an ONNX embedding model.

The photo encoder is an Informer transformer (ProbSparse attention, 8 layers, d_model=128) trained on ZTF variable-star light curves from the MACC dataset. For ONNX export, the ProbSparse attention layers are replaced with standard scaled dot-product attention, which is equivalent in expectation and fully ONNX-exportable.

Inputs

Tensor	Shape	Description
`x_enc`	`[batch, 200, 9]`	Padded photometry features (9 channels per timestep — see preprocessing)
`mask`	`[batch, 200]`	`1` for valid timesteps, `0` for padding

Outputs (ONNX)

Single file astrom3.onnx with two named outputs:

Output	Shape	Aggregation
`mean`	`[batch, 128]`	Masked mean pool of encoder outputs
`sequence`	`[batch, 200, 128]`	Full per-timestep encoder outputs (unmasked)

Preprocessing steps

The 9 input channels per timestep are built by preprocess_lc() in the upstream dataset (AstroMLCore/AstroM3Dataset):

Index	Feature	How obtained
0	`time` (HJD scaled to [0, 1])	per-observation
1	`flux` = `(flux − mean) / MAD`	per-observation
2	`flux_err` = `flux_err / MAD`	per-observation
3	`amplitude`	ASAS-SN catalog scalar, replicated to every timestep
4	`period`	ASAS-SN catalog scalar, replicated
5	`lksl_statistic` (Lafler-Kinman string length)	ASAS-SN catalog scalar, replicated
6	`rfr_score` (Random Forest Regressor R² for phase-folded LC)	ASAS-SN catalog scalar, replicated
7	`log10(MAD_flux)`	global scalar computed from LC, replicated
8	`delta_t` = `(max_HJD − min_HJD) / 365`	global scalar computed from LC, replicated

Features 3–6 come directly from the ASAS-SN v-band variable-star catalog (Jayasinghe et al. 2019) and are not recomputed from the light curve by this codebase. Users applying this model to non-ASAS-SN data must provide equivalent values (e.g. run a Lomb-Scargle period finder and compute peak-to-peak amplitude themselves).

Preprocessing recipe for a single light curve:

Deduplicate and sort observations by HJD.
Compute mean and MAD of the flux column; normalize flux and flux_err.
Scale HJD to [0, 1] over the span of the light curve.
Compute log10(MAD_flux) and delta_t = (max_HJD − min_HJD) / 365.
Obtain amplitude, period, lksl_statistic, rfr_score from the ASAS-SN catalog (or compute equivalents).
Tile the 6 global scalars across all timesteps; concatenate with columns 0–2 to produce an (N, 9) array.
Pad or center-crop to 200 timesteps; set mask = 0 for padded positions.
Use float32 for all tensors.

Weights

Source: https://huggingface.co/AstroMLCore/AstroM3-CLIP-photo

The model.safetensors file is a standalone Informer checkpoint (classification head present but unused; loaded with strict=False).

Dataset: ASAS-SN v-band variable-star light curves (AstroMLCore/AstroM3Processed).

Applying the model without ASAS-SN catalog features

Features 3–6 require the ASAS-SN catalog. For users applying the model to other surveys, we measured the sensitivity of the mean embedding to each feature being replaced. rfr_score was studied in detail.

rfr_score substitution

rfr_score is the R² of a Random Forest Regressor fit to the phase-folded light curve; it quantifies period quality (Jayasinghe et al. 2019, MNRAS 486 1907, §5; arXiv:1809.07329). In the ASAS-SN test set it ranges from −3.5 to 1.18 (median ≈ 0.38).

Setting all timesteps to the constant 0.392 (the empirical optimum, equal to the dataset median) minimises mean cosine distance from the true-feature embeddings:

Metric	Value
Overall mean cosine distance	0.049 ± 0.091
Macro-average per class	0.049 ± 0.058

Per-class breakdown (5 samples per class from the ASAS-SN test split):

Class	Mean dist	Std	True rfr mean
EW	0.005	0.005	−0.07
SR	0.004	0.003	+0.50
EA	0.060	0.032	+0.95
RRAB	0.020	0.011	+0.83
EB	0.016	0.011	+0.90
ROT	0.002	0.002	+0.85
RRC	0.147	0.115	−0.79
HADS	0.016	0.011	+0.59
M	0.050	0.020	+0.18
DSCT	0.170	0.182	−0.86

Classes whose true rfr mean is far from 0.39 (RRC, DSCT) are most affected. Using an out-of-range value (e.g. ±100) causes cosine distances ~0.93–0.97, so staying within the training distribution is important.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Papers for light-curve/astrom3

AstroM^3: A self-supervised multimodal model for astronomy

Paper • 2411.08842 • Published Nov 13, 2024

The ASAS-SN Catalog of Variable Stars II: Uniform Classification of 412,000 Known Variables

Paper • 1809.07329 • Published Feb 23, 2021