🌟 Supernova Peak Predictor

The first model to predict when and how bright a supernova will become from its earliest ZTF alert observations — with uncertainty estimates.

Given a single ZTF alert packet from a rising supernova, this model predicts:

days_to_peak — how many days until the supernova reaches maximum brightness
peakmag — the peak apparent magnitude it will achieve
80% prediction intervals — uncertainty bounds via quantile regression

This enables astronomers to answer: "Should I point the telescope at this target tonight, or can it wait?"

🔭 Try it live: Interactive Demo

Why this matters

The bottleneck in transient astronomy isn't detection — ZTF finds thousands of candidates per night. The bottleneck is follow-up telescope time. Spectroscopic observations are expensive and limited. Every night, astronomers must decide which of dozens of candidates to prioritize.

Currently, that decision is reactive: "Is this a supernova? Yes → follow up." (BTSbot solves this with 98.5% accuracy.)

Our model makes it proactive: "This SN will peak at mag 17.8 in 5±3 days → schedule it now" vs "This one won't peak for 3 months → deprioritize."

With the Vera C. Rubin Observatory (LSST) coming online, alert rates will jump from ~100K/night to ~10M/night. Automated triage like this will be essential.

Performance

Evaluated with 5-fold grouped cross-validation (grouped by supernova object ID to prevent data leakage — no alerts from the same SN appear in both train and validation).

Overall (27,202 alerts from 3,806 supernovae)

Target	MAE	Median AE	P90
days_to_peak	118.8 days	29.8 days	327.2 days
peakmag	0.257 mag	0.178 mag	0.536 mag

By number of prior detections

Detection stage	n	MAE days	Median days	MAE mag	Median mag
1-3 (first catches)	4,817	70.8	16.5	0.391	0.304
4-10 (early rise)	9,683	114.4	26.7	0.273	0.197
11-50 (sampled rise)	11,029	132.4	36.3	0.199	0.147
50+ (monitored)	1,673	192.0	77.6	0.164	0.117

Key finding: The model is most useful on the hardest, most valuable cases — the first 1-3 detections — where median timing error is just 16.5 days and median magnitude error is 0.304 mag.

By true time-to-peak

Horizon	n	MAE days	Median days	MAE mag	Median mag
Imminent (<7d)	12,694	66.2	24.9	0.330	0.229
Soon (7-30d)	10,431	62.9	20.7	0.183	0.150
Weeks (30-100d)	1,086	75.5	30.2	0.211	0.148
Distant (100d+)	2,991	552.5	433.0	0.229	0.164

The "soon" horizon (7-30 days) is the sweet spot — exactly the window where scheduling decisions matter most, and where the model achieves 0.150 mag median error.

Uncertainty Quantification

Quantile regression models (10th, 50th, 90th percentile) provide 80% prediction intervals:

Target	Coverage	Median Interval Width
days_to_peak	71.0%	40.9 days
peakmag	69.4%	0.49 mag

Coverage by detection stage:

Stage	Days coverage	Days width	Mag coverage	Mag width
1-3 detections	71.2%	42.8 days	69.9%	0.48 mag
4-10 detections	71.1%	40.3 days	69.8%	0.50 mag
11-50 detections	70.5%	41.4 days	69.0%	0.48 mag

Architecture

LightGBM gradient boosted trees on 49 engineered features extracted from ZTF alert metadata.

No images are used. We tested a ConvNeXt-pico CNN on the 63×63 difference image triplets and found that metadata alone outperforms the full multimodal model (MAE mag 0.257 vs 0.271). The images add noise at this resolution. This is itself a useful finding — it means the predictor can run at alert-stream speed (microseconds per prediction, no GPU needed).

Top features (by LightGBM importance)

For days_to_peak: ncovhist (coverage history), distpsnr1 (distance to nearest PS1 source), distpsnr2, neargaia, maxmag_so_far

For peakmag: maggaia (Gaia magnitude), peakmag_so_far (brightest seen), sgscore1 (star/galaxy score), maxmag_so_far, ndethist

The host galaxy properties (PS1 colors, star/galaxy scores, distances) dominate the timing prediction — the model is learning that where a supernova lives (host type, distance, environment) constrains how it evolves.

Model files

File	Description
`model_days.pkl`	Point estimate model for days_to_peak
`model_mag.pkl`	Point estimate model for peakmag
`model_days_q10.pkl`	10th percentile quantile model (days)
`model_days_q50.pkl`	50th percentile quantile model (days)
`model_days_q90.pkl`	90th percentile quantile model (days)
`model_mag_q10.pkl`	10th percentile quantile model (mag)
`model_mag_q50.pkl`	50th percentile quantile model (mag)
`model_mag_q90.pkl`	90th percentile quantile model (mag)
`features.py`	Feature engineering code
`model_info.json`	Feature columns, metrics, importance scores
`quantile_results.json`	Calibration results for quantile models

What we tried that didn't work

Approach	Result
ConvNeXt-pico CNN on 63×63 ZTF image triplets + metadata	6-20% worse than metadata-only across all bins
Simple MLP (114K params) on 23 raw features	Competitive but 2-5% worse than tree models with engineered features
Including `age` / `days_since_peak` as input features	Creates direct data leakage (`days_to_peak = age - days_since_peak`)

Training data

Source: MultimodalUniverse/btsbot — ZTF Bright Transient Survey alerts
Filter: Rise-phase supernovae only (is_rise=True, is_SN=True)
Size: 27,202 alerts from 3,806 unique supernovae
Splits: 5-fold GroupKFold by object ID (no alert-level leakage)

Usage

Point estimates

import pickle, json
import numpy as np
from huggingface_hub import hf_hub_download

# Download model files
model_days = pickle.load(open(hf_hub_download("hawthorneluke/supernova-peak-predictor", "model_days.pkl"), "rb"))
model_mag = pickle.load(open(hf_hub_download("hawthorneluke/supernova-peak-predictor", "model_mag.pkl"), "rb"))
info = json.load(open(hf_hub_download("hawthorneluke/supernova-peak-predictor", "model_info.json")))

# Your ZTF alert metadata (example)
from features import engineer_features  # download features.py from this repo
alert = {"magpsf": 19.2, "sigmapsf": 0.15, "ndethist": 3, ...}  # ZTF alert fields
feats = engineer_features(alert)

# Predict
X = np.array([[feats[c] for c in info['feature_cols']]], dtype=np.float32)
days_pred = model_days.predict(X)[0]
mag_pred = model_mag.predict(X)[0]
print(f"Predicted: peak in {days_pred:.1f} days at magnitude {mag_pred:.2f}")

With uncertainty intervals

# Load quantile models
model_days_q10 = pickle.load(open(hf_hub_download("hawthorneluke/supernova-peak-predictor", "model_days_q10.pkl"), "rb"))
model_days_q90 = pickle.load(open(hf_hub_download("hawthorneluke/supernova-peak-predictor", "model_days_q90.pkl"), "rb"))
model_mag_q10 = pickle.load(open(hf_hub_download("hawthorneluke/supernova-peak-predictor", "model_mag_q10.pkl"), "rb"))
model_mag_q90 = pickle.load(open(hf_hub_download("hawthorneluke/supernova-peak-predictor", "model_mag_q90.pkl"), "rb"))

# 80% prediction intervals
days_lo, days_hi = model_days_q10.predict(X)[0], model_days_q90.predict(X)[0]
mag_lo, mag_hi = model_mag_q10.predict(X)[0], model_mag_q90.predict(X)[0]
print(f"Days to peak: {days_pred:.1f} [{days_lo:.1f}, {days_hi:.1f}]")
print(f"Peak magnitude: {mag_pred:.2f} [{mag_lo:.2f}, {mag_hi:.2f}]")

Interactive demo

Try it live: hawthorneluke/supernova-peak-predictor-demo

Limitations

Long-horizon predictions are poor. For SNe >100 days from peak, the MAE is 552 days. The model essentially can't predict these — they're rare, slow-evolving transients with ambiguous early signatures.
Prediction intervals are slightly under-covering. The 80% prediction intervals achieve ~70% coverage in cross-validation. A conformal calibration step would improve this.
ZTF-specific. Features are tied to ZTF alert schema. Adaptation to LSST/Rubin alerts would require feature remapping.
No spectroscopic type prediction. We predict timing and brightness but not SN type (Ia vs II vs Ibc). This would be a natural extension.

Citation

If you use this model, please cite the underlying data:

@article{rehemtulla2024btsbot,
  title={BTSbot: A Multi-modal Deep Learning Model for Automated Bright Transient Identification},
  author={Rehemtulla, Nabeel and others},
  journal={arXiv preprint arXiv:2401.15167},
  year={2024}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Dataset used to train hawthorneluke/supernova-peak-predictor

Space using hawthorneluke/supernova-peak-predictor 1

Paper for hawthorneluke/supernova-peak-predictor

The Zwicky Transient Facility Bright Transient Survey. III. BTSbot: Automated Identification and Follow-up of Bright Transients with Deep Learning

Paper • 2401.15167 • Published Jan 26, 2024