Astromer 2

Paper

Donoso-Oliva, C., Becker, I., Protopapas, P., Cabrera-Vives, G., Cádiz-Leyton, M., & Moreno-Cartagena, D. (2026). Generalizing across astronomical surveys: Few-shot light curve classification with Astromer 2. Astronomy & Astrophysics (in press).

@article{astromer2,
  author  = {Donoso-Oliva, C. and Becker, I. and Protopapas, P. and
             Cabrera-Vives, G. and C{\'a}diz-Leyton, M. and Moreno-Cartagena, D.},
  title   = {Generalizing across astronomical surveys: Few-shot light curve
             classification with {Astromer} 2},
  journal = {Astronomy \& Astrophysics},
  year    = {2026},
  note    = {In press},
}

Original code

https://github.com/astromer-science/main-code (git submodule at models/astromer2/code/)

License

MIT — see LICENSE.

Model overview

Astromer 2 is a BERT-inspired transformer encoder pretrained on 1.5 million MACHO light curves via masked magnitude prediction. The encoder processes irregularly-sampled photometric time series (time, magnitude) using MJD-aware positional encoding and a trainable mask token. It produces per-timestep contextual embeddings that can be aggregated into a fixed-size representation for downstream tasks such as few-shot classification.

Default configuration: 6 attention blocks, 4 heads, head dimension 64 (d_model = 256), sequence length 200, embedding dimension 256.

Input data format

Raw light curves are pairs (time, mag):

time — observation time in days. Need not be absolute MJD; any consistent time axis in days works because the pipeline subtracts the per-window mean before the encoder sees it. The pretrained weights were produced from MACHO data with MJD ~48800–51700.
mag — magnitude. MACHO instrumental magnitudes are typically negative (e.g. −10 to −3); the pipeline is not restricted to that range.

Photometric errors are not used at inference. The upstream preprocessing code expects a 3-column [time, mag, err] array internally, but errors only appear in the pretraining reconstruction-loss weights (outputs['w_error']), which are never passed to the encoder. Pass dummy zeros if you run the pipeline directly.

Preprocessing steps

All steps are implemented in code/src/data/loaders.py (get_loader) and code/src/data/preprocessing.py.

Step 1 — Windowing

The upstream code supports two windowing strategies via the sampling flag of to_windows:

sampling=True — random window (used during pretraining): a single contiguous window of 200 observations is drawn at a uniformly random starting position. Light curves shorter than 200 observations are used in full.
sampling=False — sequential windows (used for test-data generation): the light curve is divided into sequential, non-overlapping windows of 200 observations. A light curve of length L yields ⌊L/200⌋ + 1 windows; the last window may be shorter than 200 and is padded in step 3. Light curves shorter than 200 observations produce a single window. When a light curve produces multiple windows, each window yields a separate embedding vector; to obtain a single per-light-curve embedding, average the per-window embeddings.

Test-data is generated with sampling=False.

Source: src/data/preprocessing.py:to_windows.

Step 2 — Zero-mean normalization

Subtract the per-window column mean from each column:

x_norm = x - mean(x, axis=0)   # x has shape [n_obs, 3]; columns: time, mag, err

After this step times = time − mean(time) and input = mag − mean(mag) are centred around zero.

Source: src/data/preprocessing.py:standardize.

Step 3 — Padding and mask construction

Right-pad the normalised sequence to exactly 200 time steps with zeros. Construct mask_in:

mask_in[i] = 0   for i < n_obs   (real observation — visible to encoder)
mask_in[i] = 1   for i >= n_obs  (padding — hidden from encoder)

Note on mask convention: the internal pipeline uses mask_in=0 for visible positions and mask_in=1 for padding/hidden positions. This is the opposite of the ONNX interface (see below).

Source: src/data/masking.py:mask_sample, padding block at the end.

Step 4 — Format encoder inputs

Extract the two encoder inputs from the normalised, padded array:

Tensor	Source	Shape
`input`	normalised magnitude column	`[batch, 200, 1]`
`times`	normalised time column	`[batch, 200, 1]`
`mask_in`	constructed in step 3	`[batch, 200, 1]`

The normalised error column is not fed to the encoder. Errors appear only in the pretraining reconstruction loss.

Source: src/data/loaders.py:format_inp_astromer (aversion='base').

Inputs (ONNX)

The exported ONNX models use a user-friendly mask convention that is the inverse of the internal pipeline:

Tensor	Shape	Description
`input`	`[batch, 200, 1]`	`mag − mean(mag)` over the window (step 2 above)
`times`	`[batch, 200, 1]`	`time − mean(time)` over the window (step 2 above)
`mask_in`	`[batch, 200, 1]`	1 = valid observation, 0 = padding

The ONNX wrapper inverts mask_in internally before passing it to the encoder, so consumers can use the intuitive convention.

Outputs (ONNX)

File	Output shape	Aggregation
`astromer2_mean.onnx`	`[batch, 256]`	Masked mean pooling: `sum(z * mask_in) / sum(mask_in)`
`astromer2_max.onnx`	`[batch, 256]`	Masked max pooling over valid timesteps
`astromer2_full.onnx`	`[batch, 200, 256]`	Full per-timestep sequence; consumer aggregates

ONNX opset: 13.

Weights

Source: Zenodo record 18207945 Training dataset: MACHO (1.5 million light curves, V and R bands) Checkpoint: astromer_v2/macho/

The test-data parquet file was generated with these MACHO weights and sampling=False (sequential windows).

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support