Astromer 2

Paper

Donoso-Oliva, C., Becker, I., Protopapas, P., Cabrera-Vives, G., CΓ‘diz-Leyton, M., & Moreno-Cartagena, D. (2026). Generalizing across astronomical surveys: Few-shot light curve classification with Astromer 2. Astronomy & Astrophysics (in press).

@article{astromer2,
  author  = {Donoso-Oliva, C. and Becker, I. and Protopapas, P. and
             Cabrera-Vives, G. and C{\'a}diz-Leyton, M. and Moreno-Cartagena, D.},
  title   = {Generalizing across astronomical surveys: Few-shot light curve
             classification with {Astromer} 2},
  journal = {Astronomy \& Astrophysics},
  year    = {2026},
  note    = {In press},
}

Original code

https://github.com/astromer-science/main-code (git submodule at models/astromer2/code/)

License

MIT β€” see LICENSE.

Model overview

Astromer 2 is a BERT-inspired transformer encoder pretrained on 1.5 million MACHO light curves via masked magnitude prediction. The encoder processes irregularly-sampled photometric time series (time, magnitude) using MJD-aware positional encoding and a trainable mask token. It produces per-timestep contextual embeddings that can be aggregated into a fixed-size representation for downstream tasks such as few-shot classification.

Default configuration: 6 attention blocks, 4 heads, head dimension 64 (d_model = 256), sequence length 200, embedding dimension 256.

Input data format

Raw light curves are pairs (time, mag):

  • time β€” observation time in days. Need not be absolute MJD; any consistent time axis in days works because the pipeline subtracts the per-window mean before the encoder sees it. The pretrained weights were produced from MACHO data with MJD ~48800–51700.
  • mag β€” magnitude. MACHO instrumental magnitudes are typically negative (e.g. βˆ’10 to βˆ’3); the pipeline is not restricted to that range.

Photometric errors are not used at inference. The upstream preprocessing code expects a 3-column [time, mag, err] array internally, but errors only appear in the pretraining reconstruction-loss weights (outputs['w_error']), which are never passed to the encoder. Pass dummy zeros if you run the pipeline directly.

Preprocessing steps

All steps are implemented in code/src/data/loaders.py (get_loader) and code/src/data/preprocessing.py.

Step 1 β€” Windowing

The upstream code supports two windowing strategies via the sampling flag of to_windows:

  • sampling=True β€” random window (used during pretraining): a single contiguous window of 200 observations is drawn at a uniformly random starting position. Light curves shorter than 200 observations are used in full.
  • sampling=False β€” sequential windows (used for test-data generation): the light curve is divided into sequential, non-overlapping windows of 200 observations. A light curve of length L yields ⌊L/200βŒ‹ + 1 windows; the last window may be shorter than 200 and is padded in step 3. Light curves shorter than 200 observations produce a single window. When a light curve produces multiple windows, each window yields a separate embedding vector; to obtain a single per-light-curve embedding, average the per-window embeddings.

Test-data is generated with sampling=False.

Source: src/data/preprocessing.py:to_windows.

Step 2 β€” Zero-mean normalization

Subtract the per-window column mean from each column:

x_norm = x - mean(x, axis=0)   # x has shape [n_obs, 3]; columns: time, mag, err

After this step times = time βˆ’ mean(time) and input = mag βˆ’ mean(mag) are centred around zero.

Source: src/data/preprocessing.py:standardize.

Step 3 β€” Padding and mask construction

Right-pad the normalised sequence to exactly 200 time steps with zeros. Construct mask_in:

mask_in[i] = 0   for i < n_obs   (real observation β€” visible to encoder)
mask_in[i] = 1   for i >= n_obs  (padding β€” hidden from encoder)

Note on mask convention: the internal pipeline uses mask_in=0 for visible positions and mask_in=1 for padding/hidden positions. This is the opposite of the ONNX interface (see below).

Source: src/data/masking.py:mask_sample, padding block at the end.

Step 4 β€” Format encoder inputs

Extract the two encoder inputs from the normalised, padded array:

Tensor Source Shape
input normalised magnitude column [batch, 200, 1]
times normalised time column [batch, 200, 1]
mask_in constructed in step 3 [batch, 200, 1]

The normalised error column is not fed to the encoder. Errors appear only in the pretraining reconstruction loss.

Source: src/data/loaders.py:format_inp_astromer (aversion='base').

Inputs (ONNX)

The exported ONNX models use a user-friendly mask convention that is the inverse of the internal pipeline:

Tensor Shape Description
input [batch, 200, 1] mag βˆ’ mean(mag) over the window (step 2 above)
times [batch, 200, 1] time βˆ’ mean(time) over the window (step 2 above)
mask_in [batch, 200, 1] 1 = valid observation, 0 = padding

The ONNX wrapper inverts mask_in internally before passing it to the encoder, so consumers can use the intuitive convention.

Outputs (ONNX)

File Output shape Aggregation
astromer2_mean.onnx [batch, 256] Masked mean pooling: sum(z * mask_in) / sum(mask_in)
astromer2_max.onnx [batch, 256] Masked max pooling over valid timesteps
astromer2_full.onnx [batch, 200, 256] Full per-timestep sequence; consumer aggregates

ONNX opset: 13.

Weights

Source: Zenodo record 18207945 Training dataset: MACHO (1.5 million light curves, V and R bands) Checkpoint: astromer_v2/macho/

The test-data parquet file was generated with these MACHO weights and sampling=False (sequential windows).

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support