Pre-training Epidemic Time Series Forecasters with Compartmental Prototypes
Accurate epidemic forecasting is crucial for outbreak preparedness, but existing data-driven models are often brittle. Typically trained on a single pathogen, they struggle with data scarcity during new outbreaks and fail under distribution shifts caused by viral evolution or interventions. However, decades of surveillance data and the design of various compartmental models from diverse diseases offer an untapped source of transferable knowledge. To leverage the collective lessons from history, we propose CAPE, the first open-source pre-trained model for epidemic forecasting. Unlike existing time series foundation models that overlook epidemiological challenges, CAPE models epidemic dynamics as mixtures of latent compartmental population states, termed compartmental prototypes. It models a flexible dictionary of compartment prototypes directly from a large collection of simulation data, enabling each outbreak to be expressed as a time-varying mixture that links observed infections to latent population states. To promote robust generalization, CAPE adopts the next-token-prediction paradigm during pre-training with lightweight epidemic-aware regularization that aligns the learned prototypes with epidemiological semantics. On a comprehensive benchmark spanning 17 diseases, CAPE significantly outperforms strong baselines with zero-shot forecasting. This work represents a principled step toward pre-trained epidemic models that are both transferable and epidemiologically grounded. We provide our code in: https://github.com/nuuuh/CAPE.
