AstroCo

Self-supervised representation learning for irregular, sparsely-sampled astronomical light curves.

AstroCo pretrains a Conformer-style encoder on raw MACHO R-band light curves with a masked-reconstruction objective, then transfers the frozen embedding to downstream variable-star classification with very few labels. It improves reconstruction error by 61 to 70% over the Astromer baselines and sets a stronger few-shot transfer point on the Alcock benchmark.

First-author work, NeurIPS 2025 ML4PS workshop.

Checkpoints

Model	Layers	Width (d)	Params	Recon RMSE	R²	Pretrain GPU
`astroco_s.ckpt`	4	276 (4 heads x 69)	5.9M	0.060	0.922	A100 80GB
`astroco_l.ckpt`	12	256 (4 heads x 64)	15.2M	0.044	0.956	H200

Both are PyTorch Lightning checkpoints. RMSE is masked-reconstruction error on held-out MACHO R-band; lower is better.

Results

Reconstruction (MACHO R-band, masked)

Model	RMSE
Astromer v1	0.148
Astromer v2	0.113
AstroCo-S	0.060
AstroCo-L	0.044

AstroCo-L is 70% below Astromer v1 and 61% below Astromer v2.

Few-shot transfer (Alcock, frozen encoder + linear head, macro-F1 %)

The encoder is frozen after pretraining; only a linear probe is trained on a small number of labels per class. Scores are 3-fold averages.

Labels / class	AstroCo-S	AstroCo-L
20	66.61	67.57
100	74.85	75.88
500	79.10	79.23

The gain holds in the low-label regime, which is where a transferable representation matters most.

Architecture

A Conformer-style encoder built for irregular time series. Each block stacks:

Multi-head self-attention for long-range structure (4 heads).
A depthwise-separable convolution module (kernel 32) for local shape. The ablation in the paper shows this convolution is the dominant few-shot contributor: removing it drops 20-shot macro-F1 below the baseline.
A gated (GLU) feed-forward block, with residual skips and LayerNorm throughout.

Positional information uses an Astromer-style embedding so the model reads irregular sampling directly. Pretraining is masked reconstruction: 50% of points probed, 60% masked, with a learned mask token. Inputs are 200-point windows, brightness and time zero-mean normalized, trained at fp16 with DDP.

Encoder block source: astro_model_arch/Astroco.py. Full hyperparameters: astro_model_arch/hyparams_astroco_{s,l}.yaml.

Intended use

Extract a fixed light-curve embedding for downstream classification, regression, or retrieval.
Few-shot variable-star classification with a frozen encoder and a small linear head.
A starting point for fine-tuning on other irregular survey data.

How to load

import torch, yaml
from astro_model_arch.Astroco import Astroco  # encoder definition in this repo

cfg = yaml.safe_load(open("astro_model_arch/hyparams_astroco_l.yaml"))
model = Astroco(**cfg)  # the yaml keys are the constructor kwargs
ckpt = torch.load("astroco_l.ckpt", map_location="cpu")
model.load_state_dict(ckpt.get("state_dict", ckpt), strict=False)
model.eval()

Use hyparams_astroco_s.yaml with astroco_s.ckpt. The checkpoints carry the Lightning training state, so strict=False skips the loss and mask-token buffers when you only want the encoder.

Input and output format

The model reads one nested dict. Each tensor is shaped (batch, window, 1) with window = 200, single band (_0, MACHO R). Brightness and time are zero-mean normalized per window; errors are the photometric uncertainties.

B, L = 4, 200
batch = {"LC": {
    "brightness_0":     torch.randn(B, L, 1),  # magnitudes, zero-mean normalized
    "time_0":           torch.randn(B, L, 1),  # observation times, zero-mean normalized
    "brightness_err_0": torch.randn(B, L, 1),  # photometric errors
}}

emb, _ = model(batch)
z = emb["LC"]["z_emb_12"]   # (B, 200, 256): layer-weighted embedding from AstroCo-L

The forward pass returns per-layer embeddings under z_emb_0 .. z_emb_N; the last index holds the softmax-weighted sum across layers and is the one to use. For AstroCo-L that key is z_emb_12 (width 256); for AstroCo-S it is z_emb_4 (width 276). Pool over the time axis (mean, or a CLS-style aggregate) to get a per-light-curve vector for a downstream head.

Training data

MACHO is a long-baseline photometric survey of the Magellanic Clouds and Galactic bulge. AstroCo pretrains on its R-band light curves, self-supervised, with no labels.
Alcock is the labeled variable-star benchmark derived from MACHO. It is the downstream task: a linear probe trained on the frozen encoder, with the few-shot label budgets above.
Raw format: each light curve is three parallel 1-D arrays, time_0 (MJD, in days), brightness_0 (R-band magnitude), and brightness_err_0 (photometric error), plus an ID. Length is variable, a few hundred points per curve. Stored as WebDataset .tar.gz shards (one .pth per array), split into train / val / test folds.
Into the model: each curve is zero-mean normalized in time and brightness, sliced into 200-point windows, and given an attention mask, which is the (batch, 200, 1) dict the forward pass expects.
Download links and the labeled 3-fold split are in classification_data_link.md.

Limitations

Trained on single-band (R) MACHO data; transfer to other surveys or bands is untested here.
The few-shot numbers are linear-probe transfer on Alcock, not end-to-end fine-tuning.
Reconstruction RMSE is a pretraining proxy, not a science metric on its own.

Citation

@inproceedings{tan2025astroco,
  title     = {AstroCo: Self-Supervised Representation Learning for Irregular Astronomical Light Curves},
  author    = {Tan, Antony},
  booktitle = {NeurIPS 2025 Workshop on Machine Learning and the Physical Sciences (ML4PS)},
  year      = {2025}
}

AntonyT1207
/

AstroCo