jtlevine's picture
Add GFS init source as alternative to ERA5T (Phase 2 wiring)
a3a194e
"""Init-source plumbing for GraphCast.
CRE's pipeline calls ``fetch_era5_for_graphcast(init_date)`` in
``src/prediction/graphcast_inference.py`` with ``init_date = today - 5``
to work around ARCO ERA5T's publication lag. That lag means the "5-day
forecast" actually covers ``today-5`` through ``today-1`` — retroactive,
not forward-looking. Alert SMS to workers can't warn before a heat event
when the forecast is already in the past.
This package provides an alternative: GFS (NOAA Global Forecast System)
analysis at ~3h lag. A GFS-sourced ``xarray.Dataset`` is shape-compatible
with the ERA5 Zarr ``full_ds`` used inside ``fetch_era5_for_graphcast``,
so the integration in Phase 2 is a one-line source swap behind a config
toggle.
Phase 1 status: this package is reachable only from tests. Nothing in the
production pipeline imports it. Wiring is deferred to Phase 2 so we can
validate the module in isolation before deploying.
Public entry point:
fetch_gfs_as_era5(target_date: str) -> xarray.Dataset
Sister module in Weather AI 2 (``~/weather AI 2/src/init_sources/``) is the
canonical copy; this CRE copy is identical code and keeps pace via manual
sync. If/when init_sources grows a third user, extract to a shared pip-
installable package.
"""
from __future__ import annotations
from src.init_sources.gfs import fetch_gfs_as_era5 # noqa: F401