AstroCo
Self-supervised representation learning for irregular, sparsely-sampled astronomical light curves.
AstroCo pretrains a Conformer-style encoder on raw MACHO R-band light curves with a masked-reconstruction objective, then transfers the frozen embedding to downstream variable-star classification with very few labels. It improves reconstruction error by 61 to 70% over the Astromer baselines and sets a stronger few-shot transfer point on the Alcock benchmark.
First-author work, NeurIPS 2025 ML4PS workshop.
Checkpoints
| Model | Layers | Width (d) | Params | Recon RMSE | R² | Pretrain GPU |
|---|---|---|---|---|---|---|
astroco_s.ckpt |
4 | 276 (4 heads x 69) | 5.9M | 0.060 | 0.922 | A100 80GB |
astroco_l.ckpt |
12 | 256 (4 heads x 64) | 15.2M | 0.044 | 0.956 | H200 |
Both are PyTorch Lightning checkpoints. RMSE is masked-reconstruction error on held-out MACHO R-band; lower is better.
Results
Reconstruction (MACHO R-band, masked)
| Model | RMSE |
|---|---|
| Astromer v1 | 0.148 |
| Astromer v2 | 0.113 |
| AstroCo-S | 0.060 |
| AstroCo-L | 0.044 |
AstroCo-L is 70% below Astromer v1 and 61% below Astromer v2.
Few-shot transfer (Alcock, frozen encoder + linear head, macro-F1 %)
The encoder is frozen after pretraining; only a linear probe is trained on a small number of labels per class. Scores are 3-fold averages.
| Labels / class | AstroCo-S | AstroCo-L |
|---|---|---|
| 20 | 66.61 | 67.57 |
| 100 | 74.85 | 75.88 |
| 500 | 79.10 | 79.23 |
The gain holds in the low-label regime, which is where a transferable representation matters most.
Architecture
A Conformer-style encoder built for irregular time series. Each block stacks:
- Multi-head self-attention for long-range structure (4 heads).
- A depthwise-separable convolution module (kernel 32) for local shape. The ablation in the paper shows this convolution is the dominant few-shot contributor: removing it drops 20-shot macro-F1 below the baseline.
- A gated (GLU) feed-forward block, with residual skips and LayerNorm throughout.
Positional information uses an Astromer-style embedding so the model reads irregular sampling directly. Pretraining is masked reconstruction: 50% of points probed, 60% masked, with a learned mask token. Inputs are 200-point windows, brightness and time zero-mean normalized, trained at fp16 with DDP.
Encoder block source: astro_model_arch/Astroco.py. Full hyperparameters: astro_model_arch/hyparams_astroco_{s,l}.yaml.
Intended use
- Extract a fixed light-curve embedding for downstream classification, regression, or retrieval.
- Few-shot variable-star classification with a frozen encoder and a small linear head.
- A starting point for fine-tuning on other irregular survey data.
How to load
import torch, yaml
from astro_model_arch.Astroco import Astroco # encoder definition in this repo
cfg = yaml.safe_load(open("astro_model_arch/hyparams_astroco_l.yaml"))
model = Astroco(**cfg) # the yaml keys are the constructor kwargs
ckpt = torch.load("astroco_l.ckpt", map_location="cpu")
model.load_state_dict(ckpt.get("state_dict", ckpt), strict=False)
model.eval()
Use hyparams_astroco_s.yaml with astroco_s.ckpt. The checkpoints carry the Lightning training state, so strict=False skips the loss and mask-token buffers when you only want the encoder.
Input and output format
The model reads one nested dict. Each tensor is shaped (batch, window, 1) with window = 200, single band (_0, MACHO R). Brightness and time are zero-mean normalized per window; errors are the photometric uncertainties.
B, L = 4, 200
batch = {"LC": {
"brightness_0": torch.randn(B, L, 1), # magnitudes, zero-mean normalized
"time_0": torch.randn(B, L, 1), # observation times, zero-mean normalized
"brightness_err_0": torch.randn(B, L, 1), # photometric errors
}}
emb, _ = model(batch)
z = emb["LC"]["z_emb_12"] # (B, 200, 256): layer-weighted embedding from AstroCo-L
The forward pass returns per-layer embeddings under z_emb_0 .. z_emb_N; the last index holds the softmax-weighted sum across layers and is the one to use. For AstroCo-L that key is z_emb_12 (width 256); for AstroCo-S it is z_emb_4 (width 276). Pool over the time axis (mean, or a CLS-style aggregate) to get a per-light-curve vector for a downstream head.
Training data
- MACHO is a long-baseline photometric survey of the Magellanic Clouds and Galactic bulge. AstroCo pretrains on its R-band light curves, self-supervised, with no labels.
- Alcock is the labeled variable-star benchmark derived from MACHO. It is the downstream task: a linear probe trained on the frozen encoder, with the few-shot label budgets above.
- Raw format: each light curve is three parallel 1-D arrays,
time_0(MJD, in days),brightness_0(R-band magnitude), andbrightness_err_0(photometric error), plus an ID. Length is variable, a few hundred points per curve. Stored as WebDataset.tar.gzshards (one.pthper array), split into train / val / test folds. - Into the model: each curve is zero-mean normalized in time and brightness, sliced into 200-point windows, and given an attention mask, which is the
(batch, 200, 1)dict the forward pass expects. - Download links and the labeled 3-fold split are in
classification_data_link.md.
Limitations
- Trained on single-band (R) MACHO data; transfer to other surveys or bands is untested here.
- The few-shot numbers are linear-probe transfer on Alcock, not end-to-end fine-tuning.
- Reconstruction RMSE is a pretraining proxy, not a science metric on its own.
Citation
@inproceedings{tan2025astroco,
title = {AstroCo: Self-Supervised Representation Learning for Irregular Astronomical Light Curves},
author = {Tan, Antony},
booktitle = {NeurIPS 2025 Workshop on Machine Learning and the Physical Sciences (ML4PS)},
year = {2025}
}
Links
- Model and data download links:
classification_data_link.md - Encoder and hyperparameters:
astro_model_arch/ - Per-run test logs:
astroco_results/test_results/