Spaces:
Sleeping
Sleeping
Real HRRR + true per-zone ISO-NE + 7-day rolling backtest from data repo
Browse files- README.md +26 -17
- about.md +44 -24
- app.py +237 -139
- hrrr_fetch.py +363 -0
- iso_ne_fetch.py +142 -196
- iso_ne_zonal.py +239 -0
- model_utils.py +38 -9
- packages.txt +2 -0
- requirements.txt +10 -3
README.md
CHANGED
|
@@ -14,19 +14,21 @@ short_description: Real-time day-ahead demand forecasting for ISO New England
|
|
| 14 |
|
| 15 |
# ⚡ Multi-Modal Deep Learning for Energy Demand Forecasting
|
| 16 |
|
| 17 |
-
Live demo of
|
| 18 |
|
| 19 |
-
|
| 20 |
-
2. **Ensemble (Baseline ⊕ Chronos-Bolt-mini, zero-shot, per-zone α)** — adds the 21 M-param Amazon foundation model on demand history alone (no weather, no fine-tuning) and reaches **4.21 % MAPE** in offline evaluation.
|
| 21 |
|
| 22 |
-
|
|
|
|
|
|
|
|
|
|
| 23 |
|
| 24 |
-
|
| 25 |
-
2. **Backtest tab**: 7 pre-computed daily forecasts (Dec 25–31, 2022 at 00:00 UTC) with all three models side-by-side and a per-zone MAPE table. The baseline curves there were computed on the Tufts HPC cluster with **real HRRR weather**, so this tab reaches the headline accuracy that the live tab can't get without weather inputs.
|
| 26 |
|
| 27 |
-
##
|
| 28 |
|
| 29 |
-
|
|
|
|
| 30 |
|
| 31 |
## Links
|
| 32 |
|
|
@@ -47,18 +49,25 @@ python app.py # http://localhost:7860
|
|
| 47 |
| File | Purpose |
|
| 48 |
|---|---|
|
| 49 |
| `app.py` | Gradio Blocks UI + Real-time / Backtest / About tabs |
|
| 50 |
-
| `iso_ne_fetch.py` |
|
|
|
|
|
|
|
| 51 |
| `calendar_features.py` | 44-d calendar one-hot encoder |
|
| 52 |
-
| `model_utils.py` |
|
| 53 |
| `models/cnn_transformer_baseline.py` | Baseline architecture (1.75 M params) |
|
| 54 |
| `checkpoints/best.pt` | Trained baseline weights (~20 MB) |
|
| 55 |
-
| `checkpoints/norm_stats.pt` | z-score statistics
|
| 56 |
-
| `assets/
|
| 57 |
-
| `assets/` |
|
| 58 |
| `about.md` | Demo explanation rendered in the UI |
|
|
|
|
| 59 |
|
| 60 |
-
##
|
| 61 |
|
| 62 |
-
|
| 63 |
-
|
| 64 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 14 |
|
| 15 |
# ⚡ Multi-Modal Deep Learning for Energy Demand Forecasting
|
| 16 |
|
| 17 |
+
Live demo of the trained CNN-Transformer baseline (1.75 M params) from our CS-137 final project (Tufts, Spring 2026), blended in a per-zone weighted ensemble with **Chronos-Bolt-mini** (Amazon, 21 M params, zero-shot on demand history).
|
| 18 |
|
| 19 |
+
**All inputs are now fully real** — no synthetic weather, no proportionally-split system demand:
|
|
|
|
| 20 |
|
| 21 |
+
- HRRR f00 weather analyses for the past 24 h (NOAA AWS S3, public)
|
| 22 |
+
- HRRR f01..f24 forecast for the future 24 h (most recent long cycle ≤ T-2h)
|
| 23 |
+
- True per-zone load from ISO-NE's public 5-minute zonal estimated load feed
|
| 24 |
+
- Calendar features (deterministic from timestamps)
|
| 25 |
|
| 26 |
+
Headline offline number: **5.24 % MAPE** baseline / **4.21 % MAPE** ensemble (with future analyses at training time — see disclosure in `about.md`). Live MAPE will be modestly worse because deployment substitutes HRRR forecasts for the future window.
|
|
|
|
| 27 |
|
| 28 |
+
## What it does
|
| 29 |
|
| 30 |
+
1. **Real-time tab**: every click pulls real ISO-NE per-zone demand + real HRRR weather and runs the ensemble. Expect ~3-5 min on the very first click of a fresh Space (cold HRRR cache + Chronos load), then ~10-30 s on subsequent clicks within the same uptime session.
|
| 31 |
+
2. **Backtest tab**: 7 daily forecasts on the most recent fully-published days, with full predict-vs-truth comparisons + per-zone MAPE table. Refreshed daily by a GitHub Actions cron in the [auxiliary data repo](https://github.com/jeffliulab/new-england-real-time-power-predict-data).
|
| 32 |
|
| 33 |
## Links
|
| 34 |
|
|
|
|
| 49 |
| File | Purpose |
|
| 50 |
|---|---|
|
| 51 |
| `app.py` | Gradio Blocks UI + Real-time / Backtest / About tabs |
|
| 52 |
+
| `iso_ne_fetch.py` | High-level demand fetcher: live ISO-NE 5-min → hourly + bundled CSV fallback + 30-day data-repo cache |
|
| 53 |
+
| `iso_ne_zonal.py` | Low-level ISO-NE 5-minute zonal CSV fetcher (cookie-prime) |
|
| 54 |
+
| `hrrr_fetch.py` | Real-time HRRR weather fetcher (Herbie + AWS S3 + KDTree-based regrid + `/tmp` cache) |
|
| 55 |
| `calendar_features.py` | 44-d calendar one-hot encoder |
|
| 56 |
+
| `model_utils.py` | Model loading + inference + Chronos ensemble |
|
| 57 |
| `models/cnn_transformer_baseline.py` | Baseline architecture (1.75 M params) |
|
| 58 |
| `checkpoints/best.pt` | Trained baseline weights (~20 MB) |
|
| 59 |
+
| `checkpoints/norm_stats.pt` | z-score statistics (weather + energy) |
|
| 60 |
+
| `assets/` | Figures shown in the *About* tab + bundled fallback samples |
|
| 61 |
+
| `assets/backtest_fallback.json` | Last-known-good backtest snapshot (used if data repo unreachable) |
|
| 62 |
| `about.md` | Demo explanation rendered in the UI |
|
| 63 |
+
| `packages.txt` | apt-style packages: `libeccodes-dev`, `libeccodes-tools` (for cfgrib) |
|
| 64 |
|
| 65 |
+
## No secrets required
|
| 66 |
|
| 67 |
+
The Space pulls real data from public, no-auth endpoints:
|
| 68 |
+
- ISO-NE: `https://www.iso-ne.com/transform/csv/fiveminuteestimatedzonalload?start=...&end=...` (with browser-cookie prime; see `iso_ne_zonal.py`)
|
| 69 |
+
- HRRR: `s3://noaa-hrrr-bdp-pds/hrrr.{date}/conus/...` via the Herbie library
|
| 70 |
+
|
| 71 |
+
The Backtest tab loads pre-built JSON from the auxiliary data repo
|
| 72 |
+
[`new-england-real-time-power-predict-data`](https://github.com/jeffliulab/new-england-real-time-power-predict-data),
|
| 73 |
+
also public; no auth needed.
|
about.md
CHANGED
|
@@ -1,48 +1,68 @@
|
|
| 1 |
## About this demo
|
| 2 |
|
| 3 |
-
This Space runs
|
| 4 |
|
| 5 |
-
|
| 6 |
-
2. **Ensemble (Baseline + Chronos-Bolt-mini)** — late-fusion of the baseline with [Chronos-Bolt-mini](https://huggingface.co/amazon/chronos-bolt-mini) (Amazon, 21 M params, Apache-2.0), used **zero-shot on demand history only** — no weather, no fine-tuning. Reaches **4.21 % MAPE** on the same offline slice and is the recommended path for this demo.
|
| 7 |
|
| 8 |
-
|
|
|
|
| 9 |
|
| 10 |
-
###
|
| 11 |
|
| 12 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 13 |
|
| 14 |
-
|
| 15 |
|
| 16 |
-
|
| 17 |
-
|---|---|---|---|
|
| 18 |
-
| Baseline weights | ✅ | ✅ | ✅ |
|
| 19 |
-
| Calendar features | ✅ | ✅ | ✅ |
|
| 20 |
-
| Demand history | ✅ live ISO-NE (or 2022 fallback) | ✅ | ✅ |
|
| 21 |
-
| **Weather inputs to baseline** | ❌ zeros (training-mean) | ❌ zeros (training-mean) | ✅ real HRRR rasters |
|
| 22 |
-
| Chronos-Bolt-mini (zero-shot, demand only) | — | ✅ | — |
|
| 23 |
|
| 24 |
-
|
| 25 |
|
| 26 |
-
|
|
|
|
|
|
|
|
|
|
| 27 |
|
| 28 |
-
|
| 29 |
|
| 30 |
-
|
| 31 |
|
| 32 |
-
|
| 33 |
|
| 34 |
-
|
| 35 |
|
| 36 |
-
|
| 37 |
|
| 38 |
-
|
|
|
|
|
|
|
| 39 |
|
| 40 |
### First-call latency
|
| 41 |
|
| 42 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 43 |
|
| 44 |
### Links
|
| 45 |
|
| 46 |
- 📄 [Final report (PDF)](https://github.com/jeffliulab/real-time-power-predict/blob/main/report/final_report.pdf)
|
| 47 |
-
- 💻 [
|
|
|
|
| 48 |
- 👤 Author: **Pang Liu** · `pliu07` · Tufts CS-137
|
|
|
|
| 1 |
## About this demo
|
| 2 |
|
| 3 |
+
This Space runs the trained CNN-Transformer baseline from our CS-137 final project on **fully real, live ISO New England inputs**, blended with **Chronos-Bolt-mini** (Amazon, 21 M params, Apache-2.0, zero-shot on demand history alone) in a per-zone weighted ensemble.
|
| 4 |
|
| 5 |
+
There are two tabs:
|
|
|
|
| 6 |
|
| 7 |
+
1. **Real-time forecast** — every click pulls the latest 24 h of demand and HRRR weather, plus a 24 h HRRR forecast cycle, and produces a 24-hour 8-zone prediction.
|
| 8 |
+
2. **Backtest (last 7 days)** — 7 daily forecasts on the most recent 7 fully-published days, refreshed every day at 04:00 UTC by a GitHub Actions cron in [`new-england-real-time-power-predict-data`](https://github.com/jeffliulab/new-england-real-time-power-predict-data).
|
| 9 |
|
| 10 |
+
### What's real (everything)
|
| 11 |
|
| 12 |
+
| Component | Source | Real or synthetic? |
|
| 13 |
+
|---|---|---|
|
| 14 |
+
| Per-zone demand history (24 h) | ISO-NE public 5-min `fiveminuteestimatedzonalload` feed → hourly mean | ✅ live (~1-2 h publication lag) |
|
| 15 |
+
| Chronos context (720 h history) | Same ISO-NE feed (data repo cache + live splice) | ✅ live |
|
| 16 |
+
| Weather history (24 h, 7 channels) | NOAA HRRR f00 analyses on AWS S3 (`noaa-hrrr-bdp-pds`) via Herbie | ✅ live |
|
| 17 |
+
| Weather forecast (24 h, 7 channels) | NOAA HRRR cycle T-1's f01..f24 forecasts | ✅ live |
|
| 18 |
+
| Calendar features | Computed deterministically from timestamps | ✅ |
|
| 19 |
+
| Baseline weights | Trained on 2019–2022 data | ✅ |
|
| 20 |
+
| Chronos-Bolt-mini | Amazon, zero-shot, no fine-tuning | ✅ |
|
| 21 |
|
| 22 |
+
The bundled 2022 sample CSVs are kept ONLY as a final fallback for when the live ISO-NE / HRRR endpoints are unreachable.
|
| 23 |
|
| 24 |
+
### Strict-discipline backtest
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 25 |
|
| 26 |
+
For each daily forecast at time **T** (the last 7 days at 00:00 UTC each):
|
| 27 |
|
| 28 |
+
- **Demand history** for hours [T-24, T-1] comes from the public 5-min zonal feed
|
| 29 |
+
- **Weather history** is 24 HRRR f00 analyses, one per valid hour [T-24, T-1]
|
| 30 |
+
- **Weather forecast** is HRRR cycle (T-1)'s f01..f24 — i.e. the most recent forecast that was issued *before* T, with valid hours [T, T+23]
|
| 31 |
+
- **Truth** for MAPE is the ISO-NE per-zone load for [T, T+23]
|
| 32 |
|
| 33 |
+
In particular **no future analyses are used** — every forecast at T sees only data that would have been available at time T, matching what a real deployment would do.
|
| 34 |
|
| 35 |
+
### Disclosure: training-time `future_weather` mismatch
|
| 36 |
|
| 37 |
+
The trained baseline saw **HRRR f00 analyses for both history AND future windows during training** (i.e. the model was given the actual weather that occurred during the prediction window as an *input* feature). This is a form of supervised-learning-with-privileged-information that the report acknowledges in §4.1.5 / §5.
|
| 38 |
|
| 39 |
+
At deployment we cannot use future analyses (they don't exist for the future yet), so we substitute HRRR forecasts (`f01..f24`) issued at the cycle just before the forecast time. The model therefore sees a slightly out-of-distribution input for the future window. **MAPE on this real-time / strict-backtest setup will be modestly worse than the offline 5.24 % headline** which used analyses for both windows.
|
| 40 |
|
| 41 |
+
This Space measures the deployable accuracy honestly. The Chronos-Bolt-mini ensemble path partially compensates because Chronos doesn't use weather at all.
|
| 42 |
|
| 43 |
+
### Per-zone allocation — actually per-zone now
|
| 44 |
+
|
| 45 |
+
Earlier prototypes of this demo used a fixed proportion vector to split the system total (from the EIA Open Data API) into 8 zones, which made the per-zone view cosmetic. The current Space pulls **true per-zone load** from ISO-NE's 5-minute estimated zonal feed, so per-zone forecasts are real.
|
| 46 |
|
| 47 |
### First-call latency
|
| 48 |
|
| 49 |
+
The first Live tab click triggers:
|
| 50 |
+
1. ~24 HRRR analysis cycles + 24 HRRR forecast hours from AWS S3 (parallel-fetched, cached at `/tmp/hrrr_cache/`)
|
| 51 |
+
2. One Chronos-Bolt-mini load (~80 MB from HuggingFace Hub)
|
| 52 |
+
|
| 53 |
+
Expect **~3-5 minutes on the very first click** of a fresh Space instance, and ~10-30 s on subsequent clicks within the same uptime session. The Backtest tab is instant — its data ships pre-computed from the data repo.
|
| 54 |
+
|
| 55 |
+
### Per-zone ensemble weights
|
| 56 |
+
|
| 57 |
+
Per-zone $\alpha_z$ (shown beneath the chart) blends baseline and Chronos:
|
| 58 |
+
|
| 59 |
+
$$\hat y_z = \alpha_z \cdot \hat y_z^{\text{baseline}} + (1 - \alpha_z) \cdot \hat y_z^{\text{Chronos}}$$
|
| 60 |
+
|
| 61 |
+
$\alpha_z$ values come from a grid search on a 14-day validation window in 2022. See Table 10 of the report for the underlying ablation.
|
| 62 |
|
| 63 |
### Links
|
| 64 |
|
| 65 |
- 📄 [Final report (PDF)](https://github.com/jeffliulab/real-time-power-predict/blob/main/report/final_report.pdf)
|
| 66 |
+
- 💻 [Main code repo](https://github.com/jeffliulab/real-time-power-predict)
|
| 67 |
+
- 🤖 [Auxiliary data repo (cron-refreshed backtest data)](https://github.com/jeffliulab/new-england-real-time-power-predict-data)
|
| 68 |
- 👤 Author: **Pang Liu** · `pliu07` · Tufts CS-137
|
app.py
CHANGED
|
@@ -1,40 +1,41 @@
|
|
| 1 |
-
"""Gradio Space:
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
-
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
- Pre-computed 7-day backtest (Dec 25-31, 2022) showing all three
|
| 23 |
-
models' forecasts vs. ground truth, with per-zone and overall MAPE.
|
| 24 |
-
- The baseline forecasts in this cache use REAL HRRR weather (computed
|
| 25 |
-
on the cluster), so this tab demonstrates the headline accuracy
|
| 26 |
-
that the live tab can't reach without weather inputs.
|
| 27 |
"""
|
| 28 |
|
| 29 |
from __future__ import annotations
|
| 30 |
|
| 31 |
import json
|
|
|
|
|
|
|
|
|
|
| 32 |
from datetime import datetime, timedelta, timezone
|
| 33 |
from pathlib import Path
|
|
|
|
| 34 |
|
| 35 |
import gradio as gr
|
| 36 |
import numpy as np
|
|
|
|
| 37 |
import plotly.graph_objects as go
|
|
|
|
| 38 |
from plotly.subplots import make_subplots
|
| 39 |
|
| 40 |
from calendar_features import encode_range
|
|
@@ -46,12 +47,23 @@ from model_utils import (
|
|
| 46 |
run_forecast,
|
| 47 |
per_zone_ensemble,
|
| 48 |
ALPHA_PER_ZONE_MINI,
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 49 |
)
|
| 50 |
|
| 51 |
ROOT = Path(__file__).parent
|
| 52 |
ASSETS = ROOT / "assets"
|
| 53 |
ABOUT = (ROOT / "about.md").read_text()
|
| 54 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 55 |
|
| 56 |
NAVY = "#1A3A5C"
|
| 57 |
ACCENT = "#2E86DE"
|
|
@@ -63,7 +75,7 @@ print("Loading baseline checkpoint...")
|
|
| 63 |
MODEL, NORM_STATS = load_baseline(ROOT / "checkpoints" / "best.pt", device="cpu")
|
| 64 |
print(f"Loaded baseline ({sum(p.numel() for p in MODEL.parameters()):,} params)")
|
| 65 |
|
| 66 |
-
# Lazy-loaded Chronos pipeline (
|
| 67 |
_CHRONOS = {"pipeline": None}
|
| 68 |
|
| 69 |
|
|
@@ -75,67 +87,163 @@ def _get_chronos():
|
|
| 75 |
return _CHRONOS["pipeline"]
|
| 76 |
|
| 77 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 78 |
def _now_utc_hour() -> datetime:
|
| 79 |
-
return datetime.now(timezone.utc).replace(
|
|
|
|
| 80 |
|
| 81 |
|
| 82 |
# =====================================================================
|
| 83 |
-
#
|
| 84 |
# =====================================================================
|
| 85 |
|
| 86 |
-
def
|
| 87 |
-
"""
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 88 |
target = _now_utc_hour()
|
| 89 |
-
|
| 90 |
-
|
| 91 |
-
|
| 92 |
-
|
| 93 |
-
|
| 94 |
-
|
| 95 |
-
|
| 96 |
-
|
| 97 |
-
|
| 98 |
-
|
| 99 |
-
|
| 100 |
-
|
| 101 |
-
|
| 102 |
-
|
| 103 |
-
|
| 104 |
-
|
| 105 |
-
|
| 106 |
-
|
| 107 |
-
|
| 108 |
-
|
| 109 |
-
|
| 110 |
-
|
| 111 |
-
|
| 112 |
-
|
| 113 |
-
|
| 114 |
-
|
| 115 |
-
|
| 116 |
-
|
| 117 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 118 |
summary = (
|
| 119 |
-
f"{
|
| 120 |
-
f"
|
| 121 |
-
f"
|
| 122 |
-
f"
|
| 123 |
-
f"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 124 |
)
|
|
|
|
| 125 |
return line, bar, summary
|
| 126 |
|
| 127 |
|
| 128 |
-
def
|
| 129 |
-
|
| 130 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 131 |
fig = make_subplots(rows=4, cols=2, shared_xaxes=False,
|
| 132 |
-
|
| 133 |
-
|
| 134 |
hist_t = [target - timedelta(hours=24 - i) for i in range(24)]
|
| 135 |
fut_t = [target + timedelta(hours=i + 1) for i in range(24)]
|
| 136 |
-
overlay = overlay or {}
|
| 137 |
overlay_palette = [GREY, TEAL, AMBER]
|
| 138 |
-
|
| 139 |
for i, zone in enumerate(ZONE_COLS):
|
| 140 |
r, c = i // 2 + 1, i % 2 + 1
|
| 141 |
fig.add_trace(go.Scatter(
|
|
@@ -146,18 +254,17 @@ def _line_plot(target: datetime, hist: np.ndarray, pred: np.ndarray,
|
|
| 146 |
fig.add_trace(go.Scatter(
|
| 147 |
x=fut_t, y=pred[:, i], mode="lines",
|
| 148 |
line=dict(color=ACCENT, width=2.5, dash="dash"),
|
| 149 |
-
name="forecast (
|
| 150 |
), row=r, col=c)
|
| 151 |
for k, (label, arr) in enumerate(overlay.items()):
|
| 152 |
-
colour = overlay_palette[k % len(overlay_palette)]
|
| 153 |
fig.add_trace(go.Scatter(
|
| 154 |
x=fut_t, y=arr[:, i], mode="lines",
|
| 155 |
-
line=dict(color=
|
| 156 |
name=label, showlegend=(i == 0),
|
| 157 |
opacity=0.85,
|
| 158 |
), row=r, col=c)
|
| 159 |
fig.add_vline(x=target, line=dict(color="grey", width=1, dash="dot"),
|
| 160 |
-
|
| 161 |
fig.update_layout(
|
| 162 |
title="Per-zone demand: history (solid) and 24-h forecast (dashed)",
|
| 163 |
height=820, plot_bgcolor="white",
|
|
@@ -169,8 +276,7 @@ def _line_plot(target: datetime, hist: np.ndarray, pred: np.ndarray,
|
|
| 169 |
return fig
|
| 170 |
|
| 171 |
|
| 172 |
-
def
|
| 173 |
-
"""Horizontal bar: predicted demand at target+1h, sorted."""
|
| 174 |
order = np.argsort(next_hour_pred)
|
| 175 |
fig = go.Figure(go.Bar(
|
| 176 |
x=next_hour_pred[order], y=[ZONE_COLS[i] for i in order],
|
|
@@ -179,7 +285,8 @@ def _bar_plot(target: datetime, next_hour_pred: np.ndarray):
|
|
| 179 |
textposition="outside",
|
| 180 |
))
|
| 181 |
fig.update_layout(
|
| 182 |
-
title=f"Predicted demand at t+1h
|
|
|
|
| 183 |
xaxis_title="MW", height=350, plot_bgcolor="white",
|
| 184 |
margin=dict(l=80, r=40, t=60, b=40),
|
| 185 |
)
|
|
@@ -192,28 +299,20 @@ def _alpha_table_md() -> str:
|
|
| 192 |
|
| 193 |
|
| 194 |
# =====================================================================
|
| 195 |
-
# Backtest tab (
|
| 196 |
# =====================================================================
|
| 197 |
|
| 198 |
-
if BACKTEST_JSON.exists():
|
| 199 |
-
BACKTEST = json.loads(BACKTEST_JSON.read_text())
|
| 200 |
-
else:
|
| 201 |
-
BACKTEST = None
|
| 202 |
-
print(f"WARNING: backtest cache not found at {BACKTEST_JSON}")
|
| 203 |
-
|
| 204 |
-
|
| 205 |
def _backtest_overview_plot():
|
| 206 |
-
"""One row per zone, showing 7-day truth vs. each model's forecast."""
|
| 207 |
if BACKTEST is None:
|
| 208 |
return go.Figure()
|
| 209 |
forecasts = BACKTEST["forecasts"]
|
| 210 |
fig = make_subplots(rows=4, cols=2, shared_xaxes=False,
|
| 211 |
-
|
| 212 |
-
|
| 213 |
for i, zone in enumerate(ZONE_COLS):
|
| 214 |
r, c = i // 2 + 1, i % 2 + 1
|
| 215 |
for f in forecasts:
|
| 216 |
-
start = datetime.fromisoformat(f["start"])
|
| 217 |
t = [start + timedelta(hours=h) for h in range(24)]
|
| 218 |
truth = np.asarray(f["truth_24h"])[:, i]
|
| 219 |
base = np.asarray(f["baseline"])[:, i]
|
|
@@ -242,12 +341,16 @@ def _backtest_overview_plot():
|
|
| 242 |
line=dict(color=ACCENT, width=2, dash="dash"),
|
| 243 |
name="ensemble", showlegend=show,
|
| 244 |
), row=r, col=c)
|
|
|
|
| 245 |
fig.update_layout(
|
| 246 |
-
title="7-day backtest
|
|
|
|
|
|
|
|
|
|
| 247 |
height=900, plot_bgcolor="white",
|
| 248 |
margin=dict(l=40, r=20, t=80, b=40),
|
| 249 |
legend=dict(orientation="h", yanchor="bottom", y=1.02,
|
| 250 |
-
|
| 251 |
)
|
| 252 |
fig.update_yaxes(title_text="MW", title_standoff=4)
|
| 253 |
return fig
|
|
@@ -255,36 +358,45 @@ def _backtest_overview_plot():
|
|
| 255 |
|
| 256 |
def _backtest_summary_md() -> str:
|
| 257 |
if BACKTEST is None:
|
| 258 |
-
return "
|
|
|
|
|
|
|
|
|
|
|
|
|
| 259 |
s = BACKTEST["summary"]
|
|
|
|
| 260 |
rows = []
|
| 261 |
rows.append("| Model | " + " | ".join(ZONE_COLS) + " | **Overall** |")
|
| 262 |
rows.append("|---|" + "|".join(["---"] * (len(ZONE_COLS) + 1)) + "|")
|
| 263 |
for key, label in (("baseline", "Baseline (real HRRR)"),
|
| 264 |
-
|
| 265 |
-
|
| 266 |
per_zone = " | ".join(f"{s[key]['per_zone'][z]:.2f}" for z in ZONE_COLS)
|
| 267 |
rows.append(f"| {label} | {per_zone} | **{s[key]['overall']:.2f}** |")
|
| 268 |
table = "\n".join(rows)
|
|
|
|
| 269 |
return (
|
| 270 |
-
f"### 7
|
|
|
|
|
|
|
|
|
|
| 271 |
f"{table}\n\n"
|
| 272 |
-
f"_Each forecast
|
| 273 |
-
f"
|
| 274 |
-
f"
|
| 275 |
-
f"
|
| 276 |
-
f"
|
| 277 |
-
f"doesn't need weather at all._"
|
| 278 |
)
|
| 279 |
|
| 280 |
|
| 281 |
def _backtest_bars():
|
| 282 |
-
"""Bar chart: overall MAPE per model (averaged over 7 forecasts)."""
|
| 283 |
if BACKTEST is None:
|
| 284 |
return go.Figure()
|
| 285 |
s = BACKTEST["summary"]
|
| 286 |
-
labels = ["Baseline\n(real HRRR)", "Chronos-Bolt-mini\n(zero-shot)",
|
| 287 |
-
|
|
|
|
|
|
|
| 288 |
fig = go.Figure(go.Bar(
|
| 289 |
x=labels, y=values, marker_color=[GREY, TEAL, ACCENT],
|
| 290 |
text=[f"{v:.2f}%" for v in values], textposition="outside",
|
|
@@ -298,54 +410,42 @@ def _backtest_bars():
|
|
| 298 |
|
| 299 |
|
| 300 |
# =====================================================================
|
| 301 |
-
# Gradio
|
| 302 |
# =====================================================================
|
| 303 |
|
| 304 |
with gr.Blocks(title="ISO-NE Energy Demand Forecast",
|
| 305 |
-
|
| 306 |
gr.Markdown(
|
| 307 |
"# ⚡ Multi-Modal Deep Learning for Energy Demand Forecasting\n"
|
| 308 |
"**Author:** Pang Liu · Tufts CS-137 · "
|
| 309 |
"[GitHub](https://github.com/jeffliulab/real-time-power-predict)\n\n"
|
| 310 |
-
"
|
| 311 |
-
"
|
| 312 |
-
"
|
| 313 |
-
"
|
| 314 |
-
"
|
| 315 |
-
"
|
| 316 |
-
"only — no weather) and reaches **4.21 % MAPE** in our offline "
|
| 317 |
-
"evaluation. See the **Backtest** tab for a 7-day side-by-side "
|
| 318 |
-
"comparison and the **About** tab for full details."
|
| 319 |
)
|
| 320 |
with gr.Row():
|
| 321 |
-
|
| 322 |
-
|
| 323 |
-
"Ensemble (Baseline + Chronos-Bolt-mini)"],
|
| 324 |
-
value="Ensemble (Baseline + Chronos-Bolt-mini)",
|
| 325 |
-
label="Model",
|
| 326 |
-
scale=2,
|
| 327 |
-
)
|
| 328 |
-
run_btn = gr.Button("Forecast next 24 h (now)",
|
| 329 |
-
variant="primary", scale=1)
|
| 330 |
summary_md = gr.Markdown()
|
| 331 |
with gr.Tabs():
|
| 332 |
with gr.Tab("Real-time forecast"):
|
| 333 |
line_plot = gr.Plot(label="Per-zone history + forecast")
|
| 334 |
bar_plot = gr.Plot(label="Predicted next-hour demand")
|
| 335 |
gr.Markdown(_alpha_table_md())
|
| 336 |
-
with gr.Tab("Backtest (last 7 days
|
| 337 |
gr.Markdown(
|
| 338 |
-
"
|
| 339 |
-
"
|
| 340 |
-
"
|
| 341 |
-
"
|
| 342 |
-
"the **ensemble** is the per-zone weighted blend reported in "
|
| 343 |
-
"the paper."
|
| 344 |
)
|
| 345 |
backtest_plot = gr.Plot(value=_backtest_overview_plot(),
|
| 346 |
-
|
| 347 |
backtest_bars = gr.Plot(value=_backtest_bars(),
|
| 348 |
-
|
| 349 |
gr.Markdown(_backtest_summary_md())
|
| 350 |
with gr.Tab("About"):
|
| 351 |
gr.Markdown(ABOUT)
|
|
@@ -360,10 +460,8 @@ with gr.Blocks(title="ISO-NE Energy Demand Forecast",
|
|
| 360 |
label="Baseline CNN-Transformer architecture",
|
| 361 |
show_label=True)
|
| 362 |
|
| 363 |
-
run_btn.click(
|
| 364 |
outputs=[line_plot, bar_plot, summary_md])
|
| 365 |
-
demo.load(forecast, inputs=[model_choice],
|
| 366 |
-
outputs=[line_plot, bar_plot, summary_md])
|
| 367 |
|
| 368 |
|
| 369 |
if __name__ == "__main__":
|
|
|
|
| 1 |
+
"""Gradio Space: ISO-NE day-ahead demand forecasting (real-time + backtest).
|
| 2 |
+
|
| 3 |
+
Always-now real-time forecast on truly real inputs:
|
| 4 |
+
- HRRR f00 weather analyses for the past 24 h (NOAA AWS S3, public)
|
| 5 |
+
- HRRR forecast cycle T-1's f01..f24 for the future 24 h (no future
|
| 6 |
+
analyses are used — strict deployable forecast)
|
| 7 |
+
- Per-zone ISO-NE 5-minute estimated zonal load, rolled up to hourly
|
| 8 |
+
- Calendar features (deterministic from timestamps)
|
| 9 |
+
- Chronos-Bolt-mini zero-shot foundation-model ensemble
|
| 10 |
+
|
| 11 |
+
Backtest tab loads a 7-day rolling cache from the auxiliary data repo
|
| 12 |
+
(``new-england-real-time-power-predict-data``), refreshed daily by a
|
| 13 |
+
GitHub Actions cron. Cache is fetched once at Space startup; falls back
|
| 14 |
+
to a bundled snapshot if the data repo is unreachable.
|
| 15 |
+
|
| 16 |
+
Disclosure (also in about.md): the trained baseline saw f00 ANALYSES
|
| 17 |
+
for both history AND future windows during training (a form of data
|
| 18 |
+
leakage). At deployment we substitute HRRR f01..f24 forecasts for the
|
| 19 |
+
future window — there is no future-data leak, but the model sees a
|
| 20 |
+
slightly out-of-distribution input. Live MAPE will therefore be a bit
|
| 21 |
+
worse than the offline 5.24 % headline.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 22 |
"""
|
| 23 |
|
| 24 |
from __future__ import annotations
|
| 25 |
|
| 26 |
import json
|
| 27 |
+
import os
|
| 28 |
+
import shutil
|
| 29 |
+
import time
|
| 30 |
from datetime import datetime, timedelta, timezone
|
| 31 |
from pathlib import Path
|
| 32 |
+
from typing import Optional
|
| 33 |
|
| 34 |
import gradio as gr
|
| 35 |
import numpy as np
|
| 36 |
+
import pandas as pd
|
| 37 |
import plotly.graph_objects as go
|
| 38 |
+
import requests
|
| 39 |
from plotly.subplots import make_subplots
|
| 40 |
|
| 41 |
from calendar_features import encode_range
|
|
|
|
| 47 |
run_forecast,
|
| 48 |
per_zone_ensemble,
|
| 49 |
ALPHA_PER_ZONE_MINI,
|
| 50 |
+
HISTORY_LEN,
|
| 51 |
+
FUTURE_LEN,
|
| 52 |
+
)
|
| 53 |
+
from hrrr_fetch import (
|
| 54 |
+
fetch_history as hrrr_fetch_history,
|
| 55 |
+
fetch_forecast_for_window,
|
| 56 |
)
|
| 57 |
|
| 58 |
ROOT = Path(__file__).parent
|
| 59 |
ASSETS = ROOT / "assets"
|
| 60 |
ABOUT = (ROOT / "about.md").read_text()
|
| 61 |
+
|
| 62 |
+
DATA_REPO_BASE = "https://raw.githubusercontent.com/jeffliulab/new-england-real-time-power-predict-data/main"
|
| 63 |
+
BACKTEST_URL = f"{DATA_REPO_BASE}/data/backtest_rolling_7d.json"
|
| 64 |
+
THIRTY_DAY_CSV_URL = f"{DATA_REPO_BASE}/data/iso_ne_30d.csv"
|
| 65 |
+
LAST_BUILT_URL = f"{DATA_REPO_BASE}/data/last_built.json"
|
| 66 |
+
THIRTY_DAY_CACHE_PATH = Path("/tmp/iso_ne_30d.csv")
|
| 67 |
|
| 68 |
NAVY = "#1A3A5C"
|
| 69 |
ACCENT = "#2E86DE"
|
|
|
|
| 75 |
MODEL, NORM_STATS = load_baseline(ROOT / "checkpoints" / "best.pt", device="cpu")
|
| 76 |
print(f"Loaded baseline ({sum(p.numel() for p in MODEL.parameters()):,} params)")
|
| 77 |
|
| 78 |
+
# Lazy-loaded Chronos pipeline (loaded on first Live forecast click)
|
| 79 |
_CHRONOS = {"pipeline": None}
|
| 80 |
|
| 81 |
|
|
|
|
| 87 |
return _CHRONOS["pipeline"]
|
| 88 |
|
| 89 |
|
| 90 |
+
def _bootstrap_data_repo():
|
| 91 |
+
"""At startup, fetch the latest backtest JSON + 30-day CSV from the
|
| 92 |
+
auxiliary data repo. Saves the CSV to /tmp so iso_ne_fetch can find it.
|
| 93 |
+
Returns (backtest_dict, last_built_dict) or (None, None) if data repo
|
| 94 |
+
unreachable (Space falls back to bundled snapshot)."""
|
| 95 |
+
backtest = None
|
| 96 |
+
last_built = None
|
| 97 |
+
try:
|
| 98 |
+
r = requests.get(BACKTEST_URL, timeout=15)
|
| 99 |
+
r.raise_for_status()
|
| 100 |
+
backtest = r.json()
|
| 101 |
+
print(f"Loaded backtest JSON from data repo: "
|
| 102 |
+
f"{backtest.get('n_forecasts')} forecasts, "
|
| 103 |
+
f"built_at={backtest.get('built_at')}")
|
| 104 |
+
except Exception as e: # noqa: BLE001
|
| 105 |
+
print(f"WARN: failed to fetch backtest JSON ({e}); will use bundled fallback")
|
| 106 |
+
|
| 107 |
+
try:
|
| 108 |
+
r = requests.get(LAST_BUILT_URL, timeout=10)
|
| 109 |
+
r.raise_for_status()
|
| 110 |
+
last_built = r.json()
|
| 111 |
+
except Exception as e: # noqa: BLE001
|
| 112 |
+
print(f"WARN: failed to fetch last_built metadata ({e})")
|
| 113 |
+
|
| 114 |
+
try:
|
| 115 |
+
r = requests.get(THIRTY_DAY_CSV_URL, timeout=20)
|
| 116 |
+
r.raise_for_status()
|
| 117 |
+
THIRTY_DAY_CACHE_PATH.write_bytes(r.content)
|
| 118 |
+
print(f"Cached 30d CSV at {THIRTY_DAY_CACHE_PATH} "
|
| 119 |
+
f"({len(r.content) / 1024:.1f} KB)")
|
| 120 |
+
except Exception as e: # noqa: BLE001
|
| 121 |
+
print(f"WARN: failed to fetch 30d CSV ({e}); Chronos context will use bundled sample")
|
| 122 |
+
|
| 123 |
+
return backtest, last_built
|
| 124 |
+
|
| 125 |
+
|
| 126 |
+
BACKTEST, LAST_BUILT = _bootstrap_data_repo()
|
| 127 |
+
if BACKTEST is None:
|
| 128 |
+
# Fallback to bundled snapshot if it exists (shipped with the Space)
|
| 129 |
+
fallback = ASSETS / "backtest_fallback.json"
|
| 130 |
+
if fallback.exists():
|
| 131 |
+
try:
|
| 132 |
+
BACKTEST = json.loads(fallback.read_text())
|
| 133 |
+
print("Using bundled backtest_fallback.json")
|
| 134 |
+
except Exception as e: # noqa: BLE001
|
| 135 |
+
print(f"WARN: bundled fallback also failed: {e}")
|
| 136 |
+
|
| 137 |
+
|
| 138 |
def _now_utc_hour() -> datetime:
|
| 139 |
+
return datetime.now(timezone.utc).replace(
|
| 140 |
+
minute=0, second=0, microsecond=0, tzinfo=None)
|
| 141 |
|
| 142 |
|
| 143 |
# =====================================================================
|
| 144 |
+
# Live forecast (real-time)
|
| 145 |
# =====================================================================
|
| 146 |
|
| 147 |
+
def live_forecast(progress: Optional[gr.Progress] = None):
|
| 148 |
+
"""Pull real HRRR + real ISO-NE per-zone, run baseline + Chronos
|
| 149 |
+
ensemble, and return plots + summary markdown.
|
| 150 |
+
|
| 151 |
+
Uses Gradio's Progress widget for the slow HRRR fetch step.
|
| 152 |
+
"""
|
| 153 |
+
progress = progress or gr.Progress()
|
| 154 |
+
|
| 155 |
target = _now_utc_hour()
|
| 156 |
+
|
| 157 |
+
progress(0.05, desc="Fetching ISO-NE per-zone demand...")
|
| 158 |
+
try:
|
| 159 |
+
hist_demand, demand_src = fetch_recent_demand_mwh(target)
|
| 160 |
+
except Exception as e: # noqa: BLE001
|
| 161 |
+
return _error_panel(f"ISO-NE demand fetch failed: {e}")
|
| 162 |
+
|
| 163 |
+
progress(0.10, desc="Fetching HRRR weather history (24 cycles)...")
|
| 164 |
+
fetched = {"count": 0}
|
| 165 |
+
def _hist_progress(done, total, label):
|
| 166 |
+
fetched["count"] = done
|
| 167 |
+
progress(0.10 + 0.40 * done / total,
|
| 168 |
+
desc=f"HRRR history {done}/{total} — {label}")
|
| 169 |
+
try:
|
| 170 |
+
hist_w_raw = hrrr_fetch_history(target, hours=HISTORY_LEN,
|
| 171 |
+
parallel=4,
|
| 172 |
+
progress=_hist_progress)
|
| 173 |
+
except Exception as e: # noqa: BLE001
|
| 174 |
+
return _error_panel(f"HRRR history fetch failed: {e}")
|
| 175 |
+
|
| 176 |
+
progress(0.55, desc="Fetching HRRR weather forecast (latest long cycle)...")
|
| 177 |
+
def _fut_progress(done, total, label):
|
| 178 |
+
progress(0.55 + 0.20 * done / total,
|
| 179 |
+
desc=f"HRRR forecast {done}/{total} — {label}")
|
| 180 |
+
try:
|
| 181 |
+
fut_w_raw, cycle_for_future, fxx_start = fetch_forecast_for_window(
|
| 182 |
+
target, hours=FUTURE_LEN, parallel=4,
|
| 183 |
+
progress=_fut_progress)
|
| 184 |
+
except Exception as e: # noqa: BLE001
|
| 185 |
+
return _error_panel(f"HRRR forecast fetch failed: {e}")
|
| 186 |
+
|
| 187 |
+
progress(0.80, desc="Running baseline forward pass...")
|
| 188 |
+
hist_cal = encode_range(target - timedelta(hours=HISTORY_LEN), HISTORY_LEN)
|
| 189 |
+
fut_cal = encode_range(target, FUTURE_LEN)
|
| 190 |
+
try:
|
| 191 |
+
baseline_pred = run_forecast(
|
| 192 |
+
MODEL, hist_demand, hist_cal, fut_cal, NORM_STATS,
|
| 193 |
+
hist_weather_raw=hist_w_raw, future_weather_raw=fut_w_raw,
|
| 194 |
+
device="cpu")
|
| 195 |
+
except Exception as e: # noqa: BLE001
|
| 196 |
+
return _error_panel(f"Baseline forward failed: {e}")
|
| 197 |
+
|
| 198 |
+
progress(0.88, desc="Running Chronos-Bolt-mini zero-shot...")
|
| 199 |
+
try:
|
| 200 |
+
long_history, long_src = fetch_long_history_mwh(target, hours=720)
|
| 201 |
+
chronos_pipeline = _get_chronos()
|
| 202 |
+
chronos_pred = run_chronos_zeroshot(chronos_pipeline, long_history)
|
| 203 |
+
except Exception as e: # noqa: BLE001
|
| 204 |
+
return _error_panel(f"Chronos forecast failed: {e}")
|
| 205 |
+
|
| 206 |
+
progress(0.95, desc="Computing ensemble + plotting...")
|
| 207 |
+
ens_pred = per_zone_ensemble(baseline_pred, chronos_pred, ALPHA_PER_ZONE_MINI)
|
| 208 |
+
|
| 209 |
+
line = _live_line_plot(target, hist_demand, ens_pred,
|
| 210 |
+
overlay={"Baseline (with HRRR)": baseline_pred,
|
| 211 |
+
"Chronos zero-shot": chronos_pred})
|
| 212 |
+
bar = _live_bar_plot(target, ens_pred[0])
|
| 213 |
+
|
| 214 |
+
sys_total = ens_pred.sum(axis=1)
|
| 215 |
summary = (
|
| 216 |
+
f"### Forecast issued at **{target.strftime('%Y-%m-%d %H:00')} UTC**\n\n"
|
| 217 |
+
f"**Inputs**\n"
|
| 218 |
+
f"- Demand history: `{demand_src}`\n"
|
| 219 |
+
f"- Chronos context: `{long_src}`\n"
|
| 220 |
+
f"- Weather history: real HRRR f00 analyses, "
|
| 221 |
+
f"24 cycles {(target - timedelta(hours=24)).strftime('%Y-%m-%d %H:00')} → "
|
| 222 |
+
f"{(target - timedelta(hours=1)).strftime('%H:00')} UTC\n"
|
| 223 |
+
f"- Weather forecast: real HRRR cycle "
|
| 224 |
+
f"{cycle_for_future.strftime('%Y-%m-%d %H:00')} UTC, "
|
| 225 |
+
f"f{fxx_start:02d}..f{fxx_start + FUTURE_LEN - 1:02d}\n\n"
|
| 226 |
+
f"**Output**: 24-hour ensemble forecast covering "
|
| 227 |
+
f"**{target.strftime('%H:00')} → {(target + timedelta(hours=24)).strftime('%H:00')} UTC** · "
|
| 228 |
+
f"system-level peak: **{sys_total.max():,.0f} MW**"
|
| 229 |
)
|
| 230 |
+
progress(1.0, desc="Done")
|
| 231 |
return line, bar, summary
|
| 232 |
|
| 233 |
|
| 234 |
+
def _error_panel(msg: str):
|
| 235 |
+
return go.Figure(), go.Figure(), f"### ⚠ Forecast failed\n\n{msg}"
|
| 236 |
+
|
| 237 |
+
|
| 238 |
+
def _live_line_plot(target: datetime, hist: np.ndarray, pred: np.ndarray,
|
| 239 |
+
overlay: dict[str, np.ndarray]):
|
| 240 |
+
"""8 panels (4×2). History solid + ensemble dashed; overlays as dotted."""
|
| 241 |
fig = make_subplots(rows=4, cols=2, shared_xaxes=False,
|
| 242 |
+
subplot_titles=ZONE_COLS,
|
| 243 |
+
vertical_spacing=0.10, horizontal_spacing=0.07)
|
| 244 |
hist_t = [target - timedelta(hours=24 - i) for i in range(24)]
|
| 245 |
fut_t = [target + timedelta(hours=i + 1) for i in range(24)]
|
|
|
|
| 246 |
overlay_palette = [GREY, TEAL, AMBER]
|
|
|
|
| 247 |
for i, zone in enumerate(ZONE_COLS):
|
| 248 |
r, c = i // 2 + 1, i % 2 + 1
|
| 249 |
fig.add_trace(go.Scatter(
|
|
|
|
| 254 |
fig.add_trace(go.Scatter(
|
| 255 |
x=fut_t, y=pred[:, i], mode="lines",
|
| 256 |
line=dict(color=ACCENT, width=2.5, dash="dash"),
|
| 257 |
+
name="forecast (ensemble)", showlegend=(i == 0),
|
| 258 |
), row=r, col=c)
|
| 259 |
for k, (label, arr) in enumerate(overlay.items()):
|
|
|
|
| 260 |
fig.add_trace(go.Scatter(
|
| 261 |
x=fut_t, y=arr[:, i], mode="lines",
|
| 262 |
+
line=dict(color=overlay_palette[k], width=1.2, dash="dot"),
|
| 263 |
name=label, showlegend=(i == 0),
|
| 264 |
opacity=0.85,
|
| 265 |
), row=r, col=c)
|
| 266 |
fig.add_vline(x=target, line=dict(color="grey", width=1, dash="dot"),
|
| 267 |
+
row=r, col=c)
|
| 268 |
fig.update_layout(
|
| 269 |
title="Per-zone demand: history (solid) and 24-h forecast (dashed)",
|
| 270 |
height=820, plot_bgcolor="white",
|
|
|
|
| 276 |
return fig
|
| 277 |
|
| 278 |
|
| 279 |
+
def _live_bar_plot(target: datetime, next_hour_pred: np.ndarray):
|
|
|
|
| 280 |
order = np.argsort(next_hour_pred)
|
| 281 |
fig = go.Figure(go.Bar(
|
| 282 |
x=next_hour_pred[order], y=[ZONE_COLS[i] for i in order],
|
|
|
|
| 285 |
textposition="outside",
|
| 286 |
))
|
| 287 |
fig.update_layout(
|
| 288 |
+
title=f"Predicted demand at t+1h "
|
| 289 |
+
f"({(target + timedelta(hours=1)).strftime('%Y-%m-%d %H:00')} UTC)",
|
| 290 |
xaxis_title="MW", height=350, plot_bgcolor="white",
|
| 291 |
margin=dict(l=80, r=40, t=60, b=40),
|
| 292 |
)
|
|
|
|
| 299 |
|
| 300 |
|
| 301 |
# =====================================================================
|
| 302 |
+
# Backtest tab (rolling 7-day, loaded at startup from data repo)
|
| 303 |
# =====================================================================
|
| 304 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 305 |
def _backtest_overview_plot():
|
|
|
|
| 306 |
if BACKTEST is None:
|
| 307 |
return go.Figure()
|
| 308 |
forecasts = BACKTEST["forecasts"]
|
| 309 |
fig = make_subplots(rows=4, cols=2, shared_xaxes=False,
|
| 310 |
+
subplot_titles=ZONE_COLS,
|
| 311 |
+
vertical_spacing=0.10, horizontal_spacing=0.07)
|
| 312 |
for i, zone in enumerate(ZONE_COLS):
|
| 313 |
r, c = i // 2 + 1, i % 2 + 1
|
| 314 |
for f in forecasts:
|
| 315 |
+
start = datetime.fromisoformat(f["start"])
|
| 316 |
t = [start + timedelta(hours=h) for h in range(24)]
|
| 317 |
truth = np.asarray(f["truth_24h"])[:, i]
|
| 318 |
base = np.asarray(f["baseline"])[:, i]
|
|
|
|
| 341 |
line=dict(color=ACCENT, width=2, dash="dash"),
|
| 342 |
name="ensemble", showlegend=show,
|
| 343 |
), row=r, col=c)
|
| 344 |
+
period = BACKTEST.get("data_period", {})
|
| 345 |
fig.update_layout(
|
| 346 |
+
title=(f"7-day rolling backtest "
|
| 347 |
+
f"({period.get('first_forecast_start', '?')[:10]} → "
|
| 348 |
+
f"{period.get('last_forecast_start', '?')[:10]}) "
|
| 349 |
+
f"— actual vs 3 model variants"),
|
| 350 |
height=900, plot_bgcolor="white",
|
| 351 |
margin=dict(l=40, r=20, t=80, b=40),
|
| 352 |
legend=dict(orientation="h", yanchor="bottom", y=1.02,
|
| 353 |
+
xanchor="right", x=1),
|
| 354 |
)
|
| 355 |
fig.update_yaxes(title_text="MW", title_standoff=4)
|
| 356 |
return fig
|
|
|
|
| 358 |
|
| 359 |
def _backtest_summary_md() -> str:
|
| 360 |
if BACKTEST is None:
|
| 361 |
+
return ("_Rolling backtest unavailable — auxiliary data repo unreachable_\n\n"
|
| 362 |
+
"The Backtest tab loads its data from "
|
| 363 |
+
"[`new-england-real-time-power-predict-data`]"
|
| 364 |
+
"(https://github.com/jeffliulab/new-england-real-time-power-predict-data) "
|
| 365 |
+
"which a GitHub Actions cron refreshes every day.")
|
| 366 |
s = BACKTEST["summary"]
|
| 367 |
+
period = BACKTEST.get("data_period", {})
|
| 368 |
rows = []
|
| 369 |
rows.append("| Model | " + " | ".join(ZONE_COLS) + " | **Overall** |")
|
| 370 |
rows.append("|---|" + "|".join(["---"] * (len(ZONE_COLS) + 1)) + "|")
|
| 371 |
for key, label in (("baseline", "Baseline (real HRRR)"),
|
| 372 |
+
("chronos", "Chronos-Bolt-mini (zero-shot)"),
|
| 373 |
+
("ensemble", "Ensemble (per-zone α)")):
|
| 374 |
per_zone = " | ".join(f"{s[key]['per_zone'][z]:.2f}" for z in ZONE_COLS)
|
| 375 |
rows.append(f"| {label} | {per_zone} | **{s[key]['overall']:.2f}** |")
|
| 376 |
table = "\n".join(rows)
|
| 377 |
+
built_at = BACKTEST.get("built_at", "?")
|
| 378 |
return (
|
| 379 |
+
f"### Last 7 days of forecasts — per-zone & overall MAPE (%)\n\n"
|
| 380 |
+
f"_Window: {period.get('first_forecast_start', '?')[:16]} UTC → "
|
| 381 |
+
f"{period.get('last_forecast_start', '?')[:16]} UTC · "
|
| 382 |
+
f"refreshed {built_at[:16]} UTC_\n\n"
|
| 383 |
f"{table}\n\n"
|
| 384 |
+
f"_Each forecast issues a 24-hour prediction at 00:00 UTC. The baseline uses "
|
| 385 |
+
f"real HRRR f00 analyses for the history window (24 cycles) and HRRR f01..f24 "
|
| 386 |
+
f"forecasts from the cycle issued at T-1 for the future window — strict deployable "
|
| 387 |
+
f"backtest with no future-data leak. See **About** for the disclosure on the "
|
| 388 |
+
f"training-time future_weather mismatch._"
|
|
|
|
| 389 |
)
|
| 390 |
|
| 391 |
|
| 392 |
def _backtest_bars():
|
|
|
|
| 393 |
if BACKTEST is None:
|
| 394 |
return go.Figure()
|
| 395 |
s = BACKTEST["summary"]
|
| 396 |
+
labels = ["Baseline\n(real HRRR)", "Chronos-Bolt-mini\n(zero-shot)",
|
| 397 |
+
"Ensemble\n(per-zone α)"]
|
| 398 |
+
values = [s["baseline"]["overall"], s["chronos"]["overall"],
|
| 399 |
+
s["ensemble"]["overall"]]
|
| 400 |
fig = go.Figure(go.Bar(
|
| 401 |
x=labels, y=values, marker_color=[GREY, TEAL, ACCENT],
|
| 402 |
text=[f"{v:.2f}%" for v in values], textposition="outside",
|
|
|
|
| 410 |
|
| 411 |
|
| 412 |
# =====================================================================
|
| 413 |
+
# Gradio UI
|
| 414 |
# =====================================================================
|
| 415 |
|
| 416 |
with gr.Blocks(title="ISO-NE Energy Demand Forecast",
|
| 417 |
+
theme=gr.themes.Default(primary_hue="blue")) as demo:
|
| 418 |
gr.Markdown(
|
| 419 |
"# ⚡ Multi-Modal Deep Learning for Energy Demand Forecasting\n"
|
| 420 |
"**Author:** Pang Liu · Tufts CS-137 · "
|
| 421 |
"[GitHub](https://github.com/jeffliulab/real-time-power-predict)\n\n"
|
| 422 |
+
"Live tab pulls real ISO-NE per-zone demand + real HRRR weather "
|
| 423 |
+
"(history analyses + forecast-cycle predictions) and runs the trained "
|
| 424 |
+
"CNN-Transformer baseline blended with Chronos-Bolt-mini in a per-zone "
|
| 425 |
+
"weighted ensemble. The Backtest tab shows the same model on the most "
|
| 426 |
+
"recent 7 fully-published days, refreshed daily by GitHub Actions cron "
|
| 427 |
+
"in the auxiliary data repo."
|
|
|
|
|
|
|
|
|
|
| 428 |
)
|
| 429 |
with gr.Row():
|
| 430 |
+
run_btn = gr.Button("🔮 Forecast next 24 h (now)",
|
| 431 |
+
variant="primary", scale=1, size="lg")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 432 |
summary_md = gr.Markdown()
|
| 433 |
with gr.Tabs():
|
| 434 |
with gr.Tab("Real-time forecast"):
|
| 435 |
line_plot = gr.Plot(label="Per-zone history + forecast")
|
| 436 |
bar_plot = gr.Plot(label="Predicted next-hour demand")
|
| 437 |
gr.Markdown(_alpha_table_md())
|
| 438 |
+
with gr.Tab("Backtest (last 7 days)"):
|
| 439 |
gr.Markdown(
|
| 440 |
+
"_Strict-discipline backtest_ — at each forecast time T the "
|
| 441 |
+
"model sees only data available before T. History weather: "
|
| 442 |
+
"24 HRRR f00 analyses; future weather: f01..f24 from cycle "
|
| 443 |
+
"T-1 (the most recent cycle issued before T)."
|
|
|
|
|
|
|
| 444 |
)
|
| 445 |
backtest_plot = gr.Plot(value=_backtest_overview_plot(),
|
| 446 |
+
label="7-day per-zone comparison")
|
| 447 |
backtest_bars = gr.Plot(value=_backtest_bars(),
|
| 448 |
+
label="Overall MAPE")
|
| 449 |
gr.Markdown(_backtest_summary_md())
|
| 450 |
with gr.Tab("About"):
|
| 451 |
gr.Markdown(ABOUT)
|
|
|
|
| 460 |
label="Baseline CNN-Transformer architecture",
|
| 461 |
show_label=True)
|
| 462 |
|
| 463 |
+
run_btn.click(live_forecast,
|
| 464 |
outputs=[line_plot, bar_plot, summary_md])
|
|
|
|
|
|
|
| 465 |
|
| 466 |
|
| 467 |
if __name__ == "__main__":
|
hrrr_fetch.py
ADDED
|
@@ -0,0 +1,363 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Real-time HRRR weather fetcher for the predict-power Space.
|
| 3 |
+
|
| 4 |
+
This is the runtime counterpart to ``scripts/data_preparation/fetch_hrrr_weather.py``
|
| 5 |
+
(used to build the training set). It MUST produce arrays in the same
|
| 6 |
+
shape, channel order, and grid as training, otherwise the model sees an
|
| 7 |
+
out-of-distribution input. Specifically:
|
| 8 |
+
|
| 9 |
+
- 7 channels in fixed order:
|
| 10 |
+
[TMP_2m, RH_2m, UGRD_10m, VGRD_10m, GUST_surface, DSWRF_surface, APCP_1hr]
|
| 11 |
+
- NE bbox: lat 40.5-47.5 N, lon -74.0 to -66.0 (West)
|
| 12 |
+
- Regridded to 450 lat-rows x 449 lon-cols via xarray.interp(linear),
|
| 13 |
+
NOT direct slicing of the native Lambert-Conformal grid
|
| 14 |
+
|
| 15 |
+
We fetch from the public ``noaa-hrrr-bdp-pds`` AWS S3 bucket via the
|
| 16 |
+
Herbie library (proven path; same as training).
|
| 17 |
+
|
| 18 |
+
Two top-level entry points:
|
| 19 |
+
- ``fetch_history(end_dt, hours=24)`` returns ``(hours, 450, 449, 7)``,
|
| 20 |
+
one f00 analysis per requested hour
|
| 21 |
+
- ``fetch_forecast(cycle_dt, hours=24)`` returns ``(hours, 450, 449, 7)``,
|
| 22 |
+
cycle_dt's f01..f{hours} forecast hours
|
| 23 |
+
|
| 24 |
+
Both paths are cached at ``/tmp/hrrr_cache/{cycle_YYYYMMDDHH}_f{NN}.npz``.
|
| 25 |
+
The cache survives within an HF Space uptime session and is wiped on sleep.
|
| 26 |
+
"""
|
| 27 |
+
|
| 28 |
+
from __future__ import annotations
|
| 29 |
+
|
| 30 |
+
import logging
|
| 31 |
+
import os
|
| 32 |
+
from concurrent.futures import ThreadPoolExecutor, as_completed
|
| 33 |
+
from datetime import datetime, timedelta, timezone
|
| 34 |
+
from pathlib import Path
|
| 35 |
+
from typing import Callable, Iterable, Optional, Sequence
|
| 36 |
+
|
| 37 |
+
import numpy as np
|
| 38 |
+
|
| 39 |
+
logger = logging.getLogger(__name__)
|
| 40 |
+
|
| 41 |
+
# === Match training pipeline EXACTLY ===
|
| 42 |
+
_BBOX = {"lat_min": 40.5, "lat_max": 47.5,
|
| 43 |
+
"lon_min": -74.0, "lon_max": -66.0}
|
| 44 |
+
GRID_H = 450 # lat rows
|
| 45 |
+
GRID_W = 449 # lon cols
|
| 46 |
+
N_CHANNELS = 7
|
| 47 |
+
|
| 48 |
+
# Target lat/lon grid (geographic, not native HRRR Lambert-Conformal)
|
| 49 |
+
_LAT = np.linspace(_BBOX["lat_min"], _BBOX["lat_max"], GRID_H)
|
| 50 |
+
_LON = np.linspace(_BBOX["lon_min"], _BBOX["lon_max"], GRID_W)
|
| 51 |
+
|
| 52 |
+
# Channel definitions: (name, herbie searchString)
|
| 53 |
+
_CHANNELS: list[tuple[str, str]] = [
|
| 54 |
+
("TMP", ":TMP:2 m above ground"),
|
| 55 |
+
("RH", ":RH:2 m above ground"),
|
| 56 |
+
("UGRD", ":UGRD:10 m above ground"),
|
| 57 |
+
("VGRD", ":VGRD:10 m above ground"),
|
| 58 |
+
("GUST", ":GUST:surface"),
|
| 59 |
+
("DSWRF", ":DSWRF:surface"),
|
| 60 |
+
("APCP_1hr", ":APCP:surface:0-1 hour acc"),
|
| 61 |
+
]
|
| 62 |
+
|
| 63 |
+
CACHE_DIR = Path(os.environ.get("HRRR_CACHE_DIR", "/tmp/hrrr_cache"))
|
| 64 |
+
CACHE_DIR.mkdir(parents=True, exist_ok=True)
|
| 65 |
+
|
| 66 |
+
|
| 67 |
+
def _cache_path(cycle_dt: datetime, fxx: int) -> Path:
|
| 68 |
+
return CACHE_DIR / f"{cycle_dt.strftime('%Y%m%d%H')}_f{fxx:02d}.npz"
|
| 69 |
+
|
| 70 |
+
|
| 71 |
+
def _hour_floor_utc(dt: datetime) -> datetime:
|
| 72 |
+
if dt.tzinfo is None:
|
| 73 |
+
dt = dt.replace(tzinfo=timezone.utc)
|
| 74 |
+
dt = dt.astimezone(timezone.utc)
|
| 75 |
+
return dt.replace(minute=0, second=0, microsecond=0, tzinfo=None)
|
| 76 |
+
|
| 77 |
+
|
| 78 |
+
# --- regridding weights (computed lazily, then cached for the process) ---
|
| 79 |
+
# HRRR's native Lambert-Conformal grid is fixed across cycles, so we can
|
| 80 |
+
# precompute (mask, kdtree, weights, idxs) once from any sample dataset.
|
| 81 |
+
# Per-channel regrid is then a single matmul (~10 ms on cpu-basic).
|
| 82 |
+
_REGRID_CACHE: dict = {}
|
| 83 |
+
|
| 84 |
+
|
| 85 |
+
def _build_regrid_weights(lat2d: np.ndarray, lon2d_signed: np.ndarray):
|
| 86 |
+
"""Build cropping mask + 4-NN inverse-distance weights for our target grid.
|
| 87 |
+
|
| 88 |
+
Returns dict with keys:
|
| 89 |
+
- ``mask``: bool array (1059, 1799) selecting cells inside an NE
|
| 90 |
+
bounding box that contains our target grid with ~1° margin
|
| 91 |
+
- ``idxs``: (450*449, 4) int32 — indices into the masked source array
|
| 92 |
+
- ``weights``: (450*449, 4) float32 — sums to 1 along axis=1
|
| 93 |
+
"""
|
| 94 |
+
from scipy.spatial import cKDTree # noqa: WPS433
|
| 95 |
+
|
| 96 |
+
# Crop with margin so target-grid corners always have neighbors in source
|
| 97 |
+
mask = ((lat2d >= _BBOX["lat_min"] - 1.5)
|
| 98 |
+
& (lat2d <= _BBOX["lat_max"] + 1.5)
|
| 99 |
+
& (lon2d_signed >= _BBOX["lon_min"] - 1.5)
|
| 100 |
+
& (lon2d_signed <= _BBOX["lon_max"] + 1.5))
|
| 101 |
+
if mask.sum() == 0:
|
| 102 |
+
raise RuntimeError("Bounding-box mask is empty; HRRR grid mismatch?")
|
| 103 |
+
|
| 104 |
+
src_pts = np.stack(
|
| 105 |
+
[lat2d[mask].astype(np.float64),
|
| 106 |
+
lon2d_signed[mask].astype(np.float64)],
|
| 107 |
+
axis=-1)
|
| 108 |
+
LL, LN = np.meshgrid(_LAT, _LON, indexing="ij")
|
| 109 |
+
tgt_pts = np.stack([LL.ravel(), LN.ravel()], axis=-1)
|
| 110 |
+
|
| 111 |
+
tree = cKDTree(src_pts)
|
| 112 |
+
dists, idxs = tree.query(tgt_pts, k=4)
|
| 113 |
+
# Inverse-distance weights, normalized
|
| 114 |
+
inv_d = 1.0 / np.maximum(dists, 1e-9)
|
| 115 |
+
w = (inv_d / inv_d.sum(axis=1, keepdims=True)).astype(np.float32)
|
| 116 |
+
return {"mask": mask, "idxs": idxs.astype(np.int32), "weights": w}
|
| 117 |
+
|
| 118 |
+
|
| 119 |
+
def _regrid(field2d: np.ndarray, weights_pack: dict) -> np.ndarray:
|
| 120 |
+
"""Apply precomputed mask + weights to a (1059, 1799) HRRR field, return
|
| 121 |
+
(450, 449) float32 on the regular lat/lon target grid."""
|
| 122 |
+
cropped = field2d[weights_pack["mask"]].astype(np.float32)
|
| 123 |
+
out = (cropped[weights_pack["idxs"]] * weights_pack["weights"]).sum(axis=1)
|
| 124 |
+
return out.reshape(GRID_H, GRID_W)
|
| 125 |
+
|
| 126 |
+
|
| 127 |
+
def _fetch_one_via_herbie(cycle_dt: datetime, fxx: int) -> np.ndarray:
|
| 128 |
+
"""Fetch one (cycle, forecast-hour) pair, return (450, 449, 7) float32.
|
| 129 |
+
|
| 130 |
+
Caller is responsible for caching; this function always hits the network.
|
| 131 |
+
Raises RuntimeError on any failure.
|
| 132 |
+
"""
|
| 133 |
+
try:
|
| 134 |
+
from herbie import Herbie # noqa: WPS433 (optional heavy dep)
|
| 135 |
+
except ImportError as e:
|
| 136 |
+
raise RuntimeError(
|
| 137 |
+
f"hrrr_fetch.py requires herbie-data: {e}") from e
|
| 138 |
+
|
| 139 |
+
H = Herbie(
|
| 140 |
+
cycle_dt.strftime("%Y-%m-%d %H:00"),
|
| 141 |
+
model="hrrr",
|
| 142 |
+
product="sfc",
|
| 143 |
+
fxx=fxx,
|
| 144 |
+
verbose=False,
|
| 145 |
+
)
|
| 146 |
+
channels: list[np.ndarray] = []
|
| 147 |
+
for ch_name, regex in _CHANNELS:
|
| 148 |
+
try:
|
| 149 |
+
# Newer Herbie (>=2024.x) renamed `searchString` to `search`
|
| 150 |
+
ds = H.xarray(search=regex, verbose=False)
|
| 151 |
+
except Exception as e: # noqa: BLE001
|
| 152 |
+
# APCP accumulation window varies with forecast hour:
|
| 153 |
+
# f00 has no APCP, f01 has "0-1 hour acc" (matches our regex),
|
| 154 |
+
# f02 has "0-2 hour acc" or "1-2 hour acc", etc. We zero-fill
|
| 155 |
+
# any APCP fetch failure (the training mean is near zero in
|
| 156 |
+
# MM units anyway, so post-z-score the model sees ~0).
|
| 157 |
+
if ch_name == "APCP_1hr":
|
| 158 |
+
logger.info("APCP_1hr unavailable at %s f%02d (%s); using zero",
|
| 159 |
+
cycle_dt, fxx,
|
| 160 |
+
type(e).__name__ if not str(e) else str(e)[:80])
|
| 161 |
+
channels.append(np.zeros((GRID_H, GRID_W), dtype=np.float32))
|
| 162 |
+
continue
|
| 163 |
+
raise RuntimeError(
|
| 164 |
+
f"Herbie xarray() failed for {ch_name} at "
|
| 165 |
+
f"{cycle_dt.isoformat()} f{fxx:02d}: {e}") from e
|
| 166 |
+
var = list(ds.data_vars)[0]
|
| 167 |
+
arr = ds[var]
|
| 168 |
+
field2d = np.squeeze(arr.values)
|
| 169 |
+
if field2d.shape != (1059, 1799):
|
| 170 |
+
raise RuntimeError(
|
| 171 |
+
f"unexpected HRRR field shape {field2d.shape} for {ch_name}")
|
| 172 |
+
|
| 173 |
+
# Initialize regrid weights once per process from the first dataset
|
| 174 |
+
if "weights_pack" not in _REGRID_CACHE:
|
| 175 |
+
lat2d = arr.coords["latitude"].values
|
| 176 |
+
lon2d = arr.coords["longitude"].values
|
| 177 |
+
lon2d_signed = np.where(lon2d > 180, lon2d - 360, lon2d)
|
| 178 |
+
_REGRID_CACHE["weights_pack"] = _build_regrid_weights(
|
| 179 |
+
lat2d, lon2d_signed)
|
| 180 |
+
logger.info("Built HRRR -> NE-grid regrid weights "
|
| 181 |
+
"(one-time setup, ~0.3s)")
|
| 182 |
+
|
| 183 |
+
regridded = _regrid(field2d, _REGRID_CACHE["weights_pack"])
|
| 184 |
+
channels.append(regridded.astype(np.float32))
|
| 185 |
+
|
| 186 |
+
tensor = np.stack(channels, axis=-1)
|
| 187 |
+
if np.isnan(tensor).any():
|
| 188 |
+
raise RuntimeError(
|
| 189 |
+
f"NaN in regridded HRRR tensor for "
|
| 190 |
+
f"{cycle_dt.isoformat()} f{fxx:02d}")
|
| 191 |
+
return tensor
|
| 192 |
+
|
| 193 |
+
|
| 194 |
+
def _fetch_with_cache(cycle_dt: datetime, fxx: int) -> np.ndarray:
|
| 195 |
+
"""Fetch one (cycle, fxx) pair via cache or network."""
|
| 196 |
+
p = _cache_path(cycle_dt, fxx)
|
| 197 |
+
if p.exists():
|
| 198 |
+
try:
|
| 199 |
+
with np.load(p) as f:
|
| 200 |
+
return f["weather"].astype(np.float32)
|
| 201 |
+
except Exception: # corrupt cache file, refetch
|
| 202 |
+
p.unlink(missing_ok=True)
|
| 203 |
+
tensor = _fetch_one_via_herbie(cycle_dt, fxx)
|
| 204 |
+
# Store as float16 to halve disk usage (~2.8 MB/file vs 5.6 MB)
|
| 205 |
+
np.savez_compressed(p, weather=tensor.astype(np.float16))
|
| 206 |
+
return tensor
|
| 207 |
+
|
| 208 |
+
|
| 209 |
+
def _fetch_parallel(jobs: Sequence[tuple[datetime, int]],
|
| 210 |
+
parallel: int = 8,
|
| 211 |
+
progress: Optional[Callable[[int, int, str], None]] = None,
|
| 212 |
+
) -> dict[tuple[datetime, int], np.ndarray]:
|
| 213 |
+
"""Fetch many (cycle_dt, fxx) pairs in parallel; return dict by job key."""
|
| 214 |
+
if not jobs:
|
| 215 |
+
return {}
|
| 216 |
+
out: dict[tuple[datetime, int], np.ndarray] = {}
|
| 217 |
+
if parallel <= 1:
|
| 218 |
+
for i, (cdt, fxx) in enumerate(jobs):
|
| 219 |
+
out[(cdt, fxx)] = _fetch_with_cache(cdt, fxx)
|
| 220 |
+
if progress:
|
| 221 |
+
progress(i + 1, len(jobs), f"{cdt.strftime('%Y-%m-%d %H')} f{fxx:02d}")
|
| 222 |
+
return out
|
| 223 |
+
|
| 224 |
+
with ThreadPoolExecutor(max_workers=parallel) as ex:
|
| 225 |
+
futures = {ex.submit(_fetch_with_cache, cdt, fxx): (cdt, fxx)
|
| 226 |
+
for cdt, fxx in jobs}
|
| 227 |
+
completed = 0
|
| 228 |
+
for fut in as_completed(futures):
|
| 229 |
+
key = futures[fut]
|
| 230 |
+
out[key] = fut.result()
|
| 231 |
+
completed += 1
|
| 232 |
+
if progress:
|
| 233 |
+
cdt, fxx = key
|
| 234 |
+
progress(completed, len(jobs),
|
| 235 |
+
f"{cdt.strftime('%Y-%m-%d %H')} f{fxx:02d}")
|
| 236 |
+
return out
|
| 237 |
+
|
| 238 |
+
|
| 239 |
+
# =====================================================================
|
| 240 |
+
# Public API
|
| 241 |
+
# =====================================================================
|
| 242 |
+
|
| 243 |
+
def fetch_history(end_dt: datetime, hours: int = 24,
|
| 244 |
+
parallel: int = 8,
|
| 245 |
+
progress: Optional[Callable[[int, int, str], None]] = None,
|
| 246 |
+
) -> np.ndarray:
|
| 247 |
+
"""Return ``(hours, 450, 449, 7)`` float32 of HRRR f00 analyses for
|
| 248 |
+
the inclusive window ``[end_dt - hours, end_dt - 1h]``.
|
| 249 |
+
|
| 250 |
+
Each requested valid-hour ``H`` uses cycle ``H`` with fxx=0 (i.e.,
|
| 251 |
+
the analysis at that valid hour), matching how the training data
|
| 252 |
+
was constructed.
|
| 253 |
+
"""
|
| 254 |
+
end_dt = _hour_floor_utc(end_dt)
|
| 255 |
+
valid_hours = [end_dt - timedelta(hours=hours - i) for i in range(hours)]
|
| 256 |
+
jobs = [(vh, 0) for vh in valid_hours]
|
| 257 |
+
fetched = _fetch_parallel(jobs, parallel=parallel, progress=progress)
|
| 258 |
+
out = np.stack([fetched[(vh, 0)] for vh in valid_hours], axis=0)
|
| 259 |
+
return out
|
| 260 |
+
|
| 261 |
+
|
| 262 |
+
# HRRR cycles with extended (0-48 h) forecasts. Other hourly cycles
|
| 263 |
+
# (01/02/04/05/...) only go out to f18, so we can't get 24 h from them.
|
| 264 |
+
LONG_CYCLE_HOURS = (0, 6, 12, 18)
|
| 265 |
+
|
| 266 |
+
|
| 267 |
+
def _latest_long_cycle_le(dt: datetime) -> datetime:
|
| 268 |
+
"""Return the most recent HRRR long cycle (00/06/12/18 UTC) <= dt."""
|
| 269 |
+
dt = _hour_floor_utc(dt)
|
| 270 |
+
while dt.hour not in LONG_CYCLE_HOURS:
|
| 271 |
+
dt -= timedelta(hours=1)
|
| 272 |
+
return dt
|
| 273 |
+
|
| 274 |
+
|
| 275 |
+
def fetch_forecast_for_window(target_start: datetime, hours: int = 24,
|
| 276 |
+
publication_lag_hours: int = 2,
|
| 277 |
+
parallel: int = 8,
|
| 278 |
+
progress: Optional[Callable[[int, int, str], None]] = None,
|
| 279 |
+
) -> tuple[np.ndarray, datetime, int]:
|
| 280 |
+
"""Return ``(hours, 450, 449, 7)`` covering valid hours
|
| 281 |
+
``[target_start, target_start + hours - 1]``, using the most recent
|
| 282 |
+
HRRR long cycle (one of 00/06/12/18 UTC) that was published before
|
| 283 |
+
``target_start`` (with ``publication_lag_hours`` margin to allow for
|
| 284 |
+
cycle processing delay).
|
| 285 |
+
|
| 286 |
+
Returns ``(weather, cycle_dt, fxx_start)`` so the caller can log
|
| 287 |
+
which cycle was used.
|
| 288 |
+
"""
|
| 289 |
+
target_start = _hour_floor_utc(target_start)
|
| 290 |
+
cutoff = target_start - timedelta(hours=publication_lag_hours)
|
| 291 |
+
cycle_dt = _latest_long_cycle_le(cutoff)
|
| 292 |
+
fxx_start = int((target_start - cycle_dt).total_seconds() / 3600)
|
| 293 |
+
jobs = [(cycle_dt, fxx) for fxx in range(fxx_start, fxx_start + hours)]
|
| 294 |
+
fetched = _fetch_parallel(jobs, parallel=parallel, progress=progress)
|
| 295 |
+
out = np.stack([fetched[(cycle_dt, fxx)]
|
| 296 |
+
for fxx in range(fxx_start, fxx_start + hours)], axis=0)
|
| 297 |
+
return out, cycle_dt, fxx_start
|
| 298 |
+
|
| 299 |
+
|
| 300 |
+
def fetch_forecast(cycle_dt: datetime, hours: int = 24,
|
| 301 |
+
parallel: int = 8,
|
| 302 |
+
progress: Optional[Callable[[int, int, str], None]] = None,
|
| 303 |
+
) -> np.ndarray:
|
| 304 |
+
"""Backwards-compat wrapper: fetch f01..f{hours} from a specific cycle.
|
| 305 |
+
|
| 306 |
+
NOTE: only long cycles (00/06/12/18 UTC) reliably cover 24+ hours.
|
| 307 |
+
For automatic cycle selection, prefer ``fetch_forecast_for_window``.
|
| 308 |
+
"""
|
| 309 |
+
cycle_dt = _hour_floor_utc(cycle_dt)
|
| 310 |
+
jobs = [(cycle_dt, fxx) for fxx in range(1, hours + 1)]
|
| 311 |
+
fetched = _fetch_parallel(jobs, parallel=parallel, progress=progress)
|
| 312 |
+
out = np.stack([fetched[(cycle_dt, fxx)] for fxx in range(1, hours + 1)],
|
| 313 |
+
axis=0)
|
| 314 |
+
return out
|
| 315 |
+
|
| 316 |
+
|
| 317 |
+
def latest_available_cycle(target_dt: datetime,
|
| 318 |
+
max_lookback_hours: int = 4,
|
| 319 |
+
) -> datetime:
|
| 320 |
+
"""Find the most recent HRRR cycle <= ``target_dt`` whose forecast
|
| 321 |
+
hours appear to be on S3 (HRRR has ~1-2 hour publication lag).
|
| 322 |
+
|
| 323 |
+
We probe by trying to instantiate Herbie for each cycle from
|
| 324 |
+
``target_dt`` backwards, succeeding when ``H.grib`` resolves.
|
| 325 |
+
Returns the cycle datetime (UTC, hour-floored, naive).
|
| 326 |
+
"""
|
| 327 |
+
target_dt = _hour_floor_utc(target_dt)
|
| 328 |
+
try:
|
| 329 |
+
from herbie import Herbie # noqa: WPS433
|
| 330 |
+
except ImportError as e:
|
| 331 |
+
raise RuntimeError(f"herbie-data not installed: {e}") from e
|
| 332 |
+
|
| 333 |
+
for back in range(0, max_lookback_hours + 1):
|
| 334 |
+
cdt = target_dt - timedelta(hours=back)
|
| 335 |
+
try:
|
| 336 |
+
H = Herbie(cdt.strftime("%Y-%m-%d %H:00"),
|
| 337 |
+
model="hrrr", product="sfc", fxx=1, verbose=False)
|
| 338 |
+
if H.grib is not None:
|
| 339 |
+
return cdt
|
| 340 |
+
except Exception: # noqa: BLE001
|
| 341 |
+
continue
|
| 342 |
+
raise RuntimeError(
|
| 343 |
+
f"No HRRR cycle available within last {max_lookback_hours}h of "
|
| 344 |
+
f"{target_dt.isoformat()}")
|
| 345 |
+
|
| 346 |
+
|
| 347 |
+
if __name__ == "__main__":
|
| 348 |
+
# Smoke test: fetch one f00 + one f01 from yesterday's noon cycle
|
| 349 |
+
logging.basicConfig(level=logging.INFO, format="%(message)s")
|
| 350 |
+
yesterday_noon = (datetime.now(timezone.utc) - timedelta(days=1)
|
| 351 |
+
).replace(hour=12, minute=0, second=0, microsecond=0,
|
| 352 |
+
tzinfo=None)
|
| 353 |
+
|
| 354 |
+
print(f"Smoke test cycle: {yesterday_noon} UTC")
|
| 355 |
+
arr = _fetch_with_cache(yesterday_noon, 0)
|
| 356 |
+
print(f" f00: shape={arr.shape}, dtype={arr.dtype}, "
|
| 357 |
+
f"mean per channel: " + ", ".join(
|
| 358 |
+
f"{name}={arr[..., i].mean():.2f}" for i, (name, _) in enumerate(_CHANNELS)))
|
| 359 |
+
arr1 = _fetch_with_cache(yesterday_noon, 1)
|
| 360 |
+
print(f" f01: shape={arr1.shape}, dtype={arr1.dtype}, "
|
| 361 |
+
f"mean per channel: " + ", ".join(
|
| 362 |
+
f"{name}={arr1[..., i].mean():.2f}" for i, (name, _) in enumerate(_CHANNELS)))
|
| 363 |
+
print(f" cache dir: {CACHE_DIR}, n files: {len(list(CACHE_DIR.glob('*.npz')))}")
|
iso_ne_fetch.py
CHANGED
|
@@ -1,27 +1,21 @@
|
|
| 1 |
"""
|
| 2 |
-
|
| 3 |
|
| 4 |
-
|
| 5 |
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
|
| 12 |
|
| 13 |
-
|
| 14 |
-
|
| 15 |
-
network, so it almost always falls through.
|
| 16 |
|
| 17 |
-
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
|
| 21 |
-
True per-zone real-time data requires an authenticated ISO Express
|
| 22 |
-
account. The proportional split is a reasonable demo approximation:
|
| 23 |
-
the model sees real recent ISO-NE-wide demand patterns; only the
|
| 24 |
-
per-zone allocation is fixed.
|
| 25 |
"""
|
| 26 |
|
| 27 |
from __future__ import annotations
|
|
@@ -29,6 +23,7 @@ from __future__ import annotations
|
|
| 29 |
import logging
|
| 30 |
import os
|
| 31 |
from datetime import datetime, timedelta, timezone
|
|
|
|
| 32 |
from pathlib import Path
|
| 33 |
from typing import Optional
|
| 34 |
|
|
@@ -36,214 +31,165 @@ import numpy as np
|
|
| 36 |
import pandas as pd
|
| 37 |
import requests
|
| 38 |
|
| 39 |
-
|
| 40 |
-
|
| 41 |
-
|
| 42 |
-
# derived from 2022 historical zonal load reports.
|
| 43 |
-
# Sum is 1.0; values reflect typical share by zone.
|
| 44 |
-
ZONE_PROPORTIONS = np.array([
|
| 45 |
-
0.064, # ME
|
| 46 |
-
0.080, # NH
|
| 47 |
-
0.045, # VT
|
| 48 |
-
0.205, # CT
|
| 49 |
-
0.070, # RI
|
| 50 |
-
0.130, # SEMA
|
| 51 |
-
0.115, # WCMA
|
| 52 |
-
0.291, # NEMA_BOST (largest --- Boston metro)
|
| 53 |
-
], dtype=np.float32)
|
| 54 |
-
assert abs(ZONE_PROPORTIONS.sum() - 1.0) < 1e-3
|
| 55 |
|
| 56 |
ASSETS_DIR = Path(__file__).parent / "assets"
|
| 57 |
SAMPLE_CSV = ASSETS_DIR / "sample_demand_2022.csv"
|
| 58 |
-
SAMPLE_CSV_LONG = ASSETS_DIR / "sample_demand_2022_long.csv"
|
| 59 |
|
| 60 |
-
# In-memory cache: {
|
| 61 |
_CACHE: dict = {}
|
| 62 |
-
_CACHE_TTL_SECONDS = 300
|
| 63 |
-
|
| 64 |
-
logger = logging.getLogger(__name__)
|
| 65 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 66 |
|
| 67 |
-
def _cache_key(end_dt: datetime) -> str:
|
| 68 |
-
return end_dt.strftime("%Y-%m-%dT%H:00")
|
| 69 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 70 |
|
| 71 |
-
EIA_API_URL = "https://api.eia.gov/v2/electricity/rto/region-data/data/"
|
| 72 |
|
| 73 |
-
|
| 74 |
-
|
| 75 |
-
|
| 76 |
-
|
| 77 |
-
Requires the env var ``EIA_API_KEY`` (registered free at
|
| 78 |
-
https://www.eia.gov/opendata/register.php and exposed to this
|
| 79 |
-
Space as a Secret).
|
| 80 |
-
|
| 81 |
-
Returns ``(hours, 8)`` MWh array on success, ``None`` on any failure
|
| 82 |
-
(no key, HTTP error, missing rows, parse error).
|
| 83 |
-
"""
|
| 84 |
-
key = os.environ.get("EIA_API_KEY", "").strip()
|
| 85 |
-
if not key:
|
| 86 |
-
return None
|
| 87 |
-
try:
|
| 88 |
-
# EIA returns data on hour-ending convention; pull a generous
|
| 89 |
-
# window so we can clip the freshest `hours` hours.
|
| 90 |
-
start = (end_dt - timedelta(hours=hours + 6)).strftime("%Y-%m-%dT%H")
|
| 91 |
-
end = end_dt.strftime("%Y-%m-%dT%H")
|
| 92 |
-
params = {
|
| 93 |
-
"api_key": key,
|
| 94 |
-
"frequency": "hourly",
|
| 95 |
-
"data[0]": "value",
|
| 96 |
-
"facets[respondent][]": "ISNE",
|
| 97 |
-
"facets[type][]": "D", # 'D' = demand
|
| 98 |
-
"start": start,
|
| 99 |
-
"end": end,
|
| 100 |
-
"sort[0][column]": "period",
|
| 101 |
-
"sort[0][direction]": "desc",
|
| 102 |
-
"length": hours + 24,
|
| 103 |
-
}
|
| 104 |
-
r = requests.get(EIA_API_URL, params=params, timeout=8)
|
| 105 |
-
if r.status_code != 200:
|
| 106 |
-
logger.info("EIA API HTTP %d: %s", r.status_code, r.text[:200])
|
| 107 |
-
return None
|
| 108 |
-
payload = r.json()
|
| 109 |
-
rows = payload.get("response", {}).get("data", [])
|
| 110 |
-
if not rows:
|
| 111 |
-
return None
|
| 112 |
-
df = pd.DataFrame(rows)
|
| 113 |
-
if "period" not in df.columns or "value" not in df.columns:
|
| 114 |
-
return None
|
| 115 |
-
df["ts"] = pd.to_datetime(df["period"], utc=True, errors="coerce")
|
| 116 |
-
df = df.dropna(subset=["ts"]).sort_values("ts")
|
| 117 |
-
df["value"] = pd.to_numeric(df["value"], errors="coerce")
|
| 118 |
-
df = df.dropna(subset=["value"])
|
| 119 |
-
if len(df) < hours:
|
| 120 |
-
return None
|
| 121 |
-
last = df.tail(hours)["value"].to_numpy(dtype=np.float32)
|
| 122 |
-
return _split_to_zones(last)
|
| 123 |
-
except Exception as e: # noqa: BLE001
|
| 124 |
-
logger.info("EIA API fetch failed: %s", e)
|
| 125 |
-
return None
|
| 126 |
-
|
| 127 |
-
|
| 128 |
-
def _try_iso_ne_api(end_dt: datetime) -> Optional[np.ndarray]:
|
| 129 |
-
"""Backup: ISO-NE legacy wsclient endpoint.
|
| 130 |
-
|
| 131 |
-
Frequently returns HTTP 500 from outside their network, so this
|
| 132 |
-
is mostly a fallback after EIA. Returns ``(24, 8)`` MWh or ``None``.
|
| 133 |
-
"""
|
| 134 |
-
try:
|
| 135 |
-
url = "https://www.iso-ne.com/ws/wsclient"
|
| 136 |
-
params = {
|
| 137 |
-
"_nstmp_formDate": int(end_dt.timestamp() * 1000),
|
| 138 |
-
"_nstmp_startDate": (end_dt - timedelta(hours=25)).strftime("%m/%d/%Y"),
|
| 139 |
-
"_nstmp_endDate": end_dt.strftime("%m/%d/%Y"),
|
| 140 |
-
"_nstmp_chartName": "fuelmix",
|
| 141 |
-
}
|
| 142 |
-
r = requests.get(url, params=params, timeout=4)
|
| 143 |
-
if r.status_code != 200:
|
| 144 |
-
return None
|
| 145 |
-
data = r.json()
|
| 146 |
-
if not isinstance(data, list) or not data:
|
| 147 |
-
return None
|
| 148 |
-
df = pd.DataFrame(data)
|
| 149 |
-
if "BeginDate" not in df.columns or "GenMw" not in df.columns:
|
| 150 |
-
return None
|
| 151 |
-
df["ts"] = pd.to_datetime(df["BeginDate"])
|
| 152 |
-
hourly = df.groupby(df["ts"].dt.floor("h"))["GenMw"].sum().sort_index()
|
| 153 |
-
last24 = hourly.tail(24).values.astype(np.float32)
|
| 154 |
-
if len(last24) < 24:
|
| 155 |
-
return None
|
| 156 |
-
return _split_to_zones(last24)
|
| 157 |
-
except Exception as e: # noqa: BLE001
|
| 158 |
-
logger.info("ISO-NE API fetch failed: %s", e)
|
| 159 |
return None
|
|
|
|
|
|
|
|
|
|
|
|
|
| 160 |
|
| 161 |
|
| 162 |
-
def
|
| 163 |
-
|
| 164 |
-
return np.outer(system_total, ZONE_PROPORTIONS).astype(np.float32)
|
| 165 |
|
| 166 |
|
| 167 |
-
def
|
| 168 |
-
"""Fallback: read 24-hour slice from bundled CSV."""
|
| 169 |
df = pd.read_csv(SAMPLE_CSV)
|
| 170 |
arr = df[ZONE_COLS].tail(24).to_numpy(dtype=np.float32)
|
| 171 |
if arr.shape != (24, 8):
|
| 172 |
-
raise RuntimeError(
|
|
|
|
| 173 |
return arr
|
| 174 |
|
| 175 |
|
| 176 |
-
def
|
| 177 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 178 |
|
| 179 |
-
|
| 180 |
-
|
| 181 |
-
"
|
|
|
|
|
|
|
|
|
|
|
|
|
| 182 |
"""
|
| 183 |
if end_dt is None:
|
| 184 |
-
end_dt = datetime.now(timezone.utc)
|
| 185 |
-
|
| 186 |
-
|
| 187 |
-
cached =
|
| 188 |
if cached is not None:
|
| 189 |
-
|
| 190 |
-
|
| 191 |
-
|
| 192 |
-
|
| 193 |
-
|
| 194 |
-
|
| 195 |
-
|
| 196 |
-
return arr
|
|
|
|
|
|
|
|
|
|
| 197 |
|
| 198 |
-
arr = _try_iso_ne_api(end_dt)
|
| 199 |
-
if arr is not None:
|
| 200 |
-
_CACHE[key] = (datetime.now(timezone.utc), arr)
|
| 201 |
-
return arr.copy(), "live (ISO-NE)"
|
| 202 |
|
| 203 |
-
|
| 204 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 205 |
|
| 206 |
|
| 207 |
def fetch_long_history_mwh(end_dt: Optional[datetime] = None,
|
| 208 |
-
|
| 209 |
-
|
| 210 |
-
|
| 211 |
|
| 212 |
Strategy:
|
| 213 |
-
1.
|
| 214 |
-
2.
|
| 215 |
-
|
| 216 |
-
|
| 217 |
-
Returns:
|
| 218 |
-
(array of shape (hours, 8), source_label). source_label ends in
|
| 219 |
-
"+live" when the tail 24 h came from the API, "+sample" otherwise.
|
| 220 |
"""
|
| 221 |
if end_dt is None:
|
| 222 |
-
end_dt = datetime.now(timezone.utc)
|
| 223 |
-
|
| 224 |
-
|
| 225 |
-
|
| 226 |
-
|
| 227 |
-
|
| 228 |
-
|
| 229 |
-
|
| 230 |
-
|
| 231 |
-
|
| 232 |
-
|
| 233 |
-
|
| 234 |
-
|
| 235 |
-
|
| 236 |
-
|
| 237 |
-
|
| 238 |
-
|
| 239 |
-
|
| 240 |
-
|
| 241 |
-
|
| 242 |
-
|
| 243 |
-
|
| 244 |
-
|
| 245 |
-
|
| 246 |
-
|
| 247 |
-
|
| 248 |
-
|
| 249 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
"""
|
| 2 |
+
High-level ISO-NE per-zone demand fetcher for the Space.
|
| 3 |
|
| 4 |
+
Wraps the low-level fetcher in ``iso_ne_zonal.py`` with:
|
| 5 |
|
| 6 |
+
- In-memory cache (5-minute TTL) so repeated clicks within a few
|
| 7 |
+
minutes don't refetch from ISO-NE
|
| 8 |
+
- Optional bundled CSV fallback for offline / API-down scenarios
|
| 9 |
+
- Optional integration with a long-history CSV pulled from the data
|
| 10 |
+
repo at Space startup (used to seed Chronos context without
|
| 11 |
+
re-fetching 30 days of ISO-NE on every click)
|
| 12 |
|
| 13 |
+
Public API kept stable so ``app.py`` can swap from the old EIA-based
|
| 14 |
+
implementation without further changes:
|
|
|
|
| 15 |
|
| 16 |
+
- ``ZONE_COLS`` : list of 8 zone names
|
| 17 |
+
- ``fetch_recent_demand_mwh(end_dt)`` : (24, 8) MWh + source label
|
| 18 |
+
- ``fetch_long_history_mwh(end_dt, hours=720)`` : (hours, 8) MWh + label
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 19 |
"""
|
| 20 |
|
| 21 |
from __future__ import annotations
|
|
|
|
| 23 |
import logging
|
| 24 |
import os
|
| 25 |
from datetime import datetime, timedelta, timezone
|
| 26 |
+
from io import StringIO
|
| 27 |
from pathlib import Path
|
| 28 |
from typing import Optional
|
| 29 |
|
|
|
|
| 31 |
import pandas as pd
|
| 32 |
import requests
|
| 33 |
|
| 34 |
+
from iso_ne_zonal import ZONE_COLS, fetch_range, fetch_recent_hours
|
| 35 |
+
|
| 36 |
+
logger = logging.getLogger(__name__)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 37 |
|
| 38 |
ASSETS_DIR = Path(__file__).parent / "assets"
|
| 39 |
SAMPLE_CSV = ASSETS_DIR / "sample_demand_2022.csv"
|
| 40 |
+
SAMPLE_CSV_LONG = ASSETS_DIR / "sample_demand_2022_long.csv"
|
| 41 |
|
| 42 |
+
# In-memory cache: { ("recent", end_hour) | ("long", end_hour, hours) -> (ts, np.ndarray) }
|
| 43 |
_CACHE: dict = {}
|
| 44 |
+
_CACHE_TTL_SECONDS = 300
|
|
|
|
|
|
|
| 45 |
|
| 46 |
+
# Path of the data-repo 30-day CSV (refreshed daily by GitHub Actions in
|
| 47 |
+
# new-england-real-time-power-predict-data; downloaded by app.py at
|
| 48 |
+
# startup and saved to /tmp). When present, fetch_long_history_mwh
|
| 49 |
+
# uses it as the base and splices in the last 1-2 days from live API.
|
| 50 |
+
DATA_REPO_30D_CSV_PATH = Path(os.environ.get(
|
| 51 |
+
"DATA_REPO_30D_CSV_PATH", "/tmp/iso_ne_30d.csv"))
|
| 52 |
|
|
|
|
|
|
|
| 53 |
|
| 54 |
+
def _hour_floor_utc(dt: datetime) -> datetime:
|
| 55 |
+
if dt.tzinfo is None:
|
| 56 |
+
dt = dt.replace(tzinfo=timezone.utc)
|
| 57 |
+
return dt.astimezone(timezone.utc).replace(
|
| 58 |
+
minute=0, second=0, microsecond=0, tzinfo=None)
|
| 59 |
|
|
|
|
| 60 |
|
| 61 |
+
def _cache_get(key: tuple) -> Optional[np.ndarray]:
|
| 62 |
+
cached = _CACHE.get(key)
|
| 63 |
+
if cached is None:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 64 |
return None
|
| 65 |
+
ts, arr = cached
|
| 66 |
+
if (datetime.now(timezone.utc) - ts).total_seconds() < _CACHE_TTL_SECONDS:
|
| 67 |
+
return arr.copy()
|
| 68 |
+
return None
|
| 69 |
|
| 70 |
|
| 71 |
+
def _cache_put(key: tuple, arr: np.ndarray) -> None:
|
| 72 |
+
_CACHE[(key)] = (datetime.now(timezone.utc), arr.copy())
|
|
|
|
| 73 |
|
| 74 |
|
| 75 |
+
def _load_sample_recent() -> np.ndarray:
|
|
|
|
| 76 |
df = pd.read_csv(SAMPLE_CSV)
|
| 77 |
arr = df[ZONE_COLS].tail(24).to_numpy(dtype=np.float32)
|
| 78 |
if arr.shape != (24, 8):
|
| 79 |
+
raise RuntimeError(
|
| 80 |
+
f"Bundled sample_demand_2022.csv has wrong shape {arr.shape}")
|
| 81 |
return arr
|
| 82 |
|
| 83 |
|
| 84 |
+
def _load_sample_long(hours: int) -> np.ndarray:
|
| 85 |
+
if SAMPLE_CSV_LONG.exists():
|
| 86 |
+
df = pd.read_csv(SAMPLE_CSV_LONG)
|
| 87 |
+
arr = df[ZONE_COLS].tail(hours).to_numpy(dtype=np.float32)
|
| 88 |
+
if arr.shape == (hours, 8):
|
| 89 |
+
return arr
|
| 90 |
+
short = _load_sample_recent()
|
| 91 |
+
return np.tile(short, (hours // 24 + 1, 1))[:hours].astype(np.float32)
|
| 92 |
+
|
| 93 |
|
| 94 |
+
def fetch_recent_demand_mwh(end_dt: Optional[datetime] = None
|
| 95 |
+
) -> tuple[np.ndarray, str]:
|
| 96 |
+
"""Return ``(24, 8)`` MWh for the most recent 24 contiguous hours
|
| 97 |
+
ending at ``end_dt`` (or now). Source label is one of:
|
| 98 |
+
- ``"live (ISO-NE 5-min zonal -> hourly)"``
|
| 99 |
+
- ``"cached"``
|
| 100 |
+
- ``"sample-2022"``
|
| 101 |
"""
|
| 102 |
if end_dt is None:
|
| 103 |
+
end_dt = datetime.now(timezone.utc)
|
| 104 |
+
end_dt = _hour_floor_utc(end_dt)
|
| 105 |
+
cache_key = ("recent", end_dt)
|
| 106 |
+
cached = _cache_get(cache_key)
|
| 107 |
if cached is not None:
|
| 108 |
+
return cached, "cached"
|
| 109 |
+
try:
|
| 110 |
+
arr, latest = fetch_recent_hours(end_dt, hours=24)
|
| 111 |
+
_cache_put(cache_key, arr)
|
| 112 |
+
lag_hours = (end_dt - latest).total_seconds() / 3600
|
| 113 |
+
label = f"live (ISO-NE 5-min zonal, latest hour {latest.isoformat()}, "
|
| 114 |
+
label += f"lag {lag_hours:.0f}h)" if lag_hours > 0 else f"live (ISO-NE 5-min zonal)"
|
| 115 |
+
return arr, label
|
| 116 |
+
except Exception as e: # noqa: BLE001
|
| 117 |
+
logger.warning("ISO-NE realtime fetch failed: %s; falling back to bundled CSV", e)
|
| 118 |
+
return _load_sample_recent(), "sample-2022 (ISO-NE unreachable)"
|
| 119 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 120 |
|
| 121 |
+
def _load_30d_base() -> Optional[pd.DataFrame]:
|
| 122 |
+
"""Load data-repo's pre-built 30-day per-zone CSV if available."""
|
| 123 |
+
if not DATA_REPO_30D_CSV_PATH.exists():
|
| 124 |
+
return None
|
| 125 |
+
try:
|
| 126 |
+
df = pd.read_csv(DATA_REPO_30D_CSV_PATH, parse_dates=["timestamp_utc"])
|
| 127 |
+
df = df.set_index("timestamp_utc").sort_index()
|
| 128 |
+
return df[ZONE_COLS]
|
| 129 |
+
except Exception as e: # noqa: BLE001
|
| 130 |
+
logger.warning("Failed to load 30d base CSV at %s: %s",
|
| 131 |
+
DATA_REPO_30D_CSV_PATH, e)
|
| 132 |
+
return None
|
| 133 |
|
| 134 |
|
| 135 |
def fetch_long_history_mwh(end_dt: Optional[datetime] = None,
|
| 136 |
+
hours: int = 720
|
| 137 |
+
) -> tuple[np.ndarray, str]:
|
| 138 |
+
"""Return ``(hours, 8)`` MWh of per-zone history ending at ``end_dt - 1h``.
|
| 139 |
|
| 140 |
Strategy:
|
| 141 |
+
1. If the data repo's 30d base CSV is present, start from it.
|
| 142 |
+
2. Otherwise fall back to the bundled long-history CSV.
|
| 143 |
+
3. Always splice the last ~24-48 hours from the live ISO-NE API
|
| 144 |
+
so the tail is fresh.
|
|
|
|
|
|
|
|
|
|
| 145 |
"""
|
| 146 |
if end_dt is None:
|
| 147 |
+
end_dt = datetime.now(timezone.utc)
|
| 148 |
+
end_dt = _hour_floor_utc(end_dt)
|
| 149 |
+
cache_key = ("long", end_dt, hours)
|
| 150 |
+
cached = _cache_get(cache_key)
|
| 151 |
+
if cached is not None:
|
| 152 |
+
return cached, "cached"
|
| 153 |
+
|
| 154 |
+
target_end = end_dt - timedelta(hours=1) # last hour we want
|
| 155 |
+
target_start = target_end - timedelta(hours=hours - 1)
|
| 156 |
+
|
| 157 |
+
base = _load_30d_base()
|
| 158 |
+
base_label = "data-repo 30d"
|
| 159 |
+
|
| 160 |
+
if base is None:
|
| 161 |
+
long_arr = _load_sample_long(hours)
|
| 162 |
+
out = long_arr
|
| 163 |
+
_cache_put(cache_key, out)
|
| 164 |
+
return out, "sample-2022 (no data-repo CSV)"
|
| 165 |
+
|
| 166 |
+
# Try to splice live ISO-NE for the last 2 days for freshness
|
| 167 |
+
splice_label = ""
|
| 168 |
+
try:
|
| 169 |
+
live = fetch_range(target_end - timedelta(days=2), target_end,
|
| 170 |
+
hourly=True)
|
| 171 |
+
# Overwrite overlapping rows in `base` with `live`
|
| 172 |
+
base.update(live)
|
| 173 |
+
splice_label = " + live splice"
|
| 174 |
+
except Exception as e: # noqa: BLE001
|
| 175 |
+
logger.info("Live splice into long history failed: %s", e)
|
| 176 |
+
|
| 177 |
+
# Ensure we have continuous coverage; if base doesn't reach target_start,
|
| 178 |
+
# fall back to bundled long CSV for the missing tail
|
| 179 |
+
if base.index.min() > target_start:
|
| 180 |
+
logger.info("30d base starts at %s, missing %s -> %s; padding from sample",
|
| 181 |
+
base.index.min(), target_start, base.index.min())
|
| 182 |
+
sample_long = _load_sample_long(hours)
|
| 183 |
+
out = sample_long
|
| 184 |
+
else:
|
| 185 |
+
# Slice exact window
|
| 186 |
+
idx = pd.date_range(start=target_start, end=target_end, freq="1h")
|
| 187 |
+
sliced = base.reindex(idx)
|
| 188 |
+
if sliced.isna().any().any():
|
| 189 |
+
logger.info("30d base has %d NaN rows in window; interpolating",
|
| 190 |
+
int(sliced.isna().any(axis=1).sum()))
|
| 191 |
+
sliced = sliced.interpolate(method="time", limit=12).ffill().bfill()
|
| 192 |
+
out = sliced[ZONE_COLS].to_numpy(dtype=np.float32)
|
| 193 |
+
|
| 194 |
+
_cache_put(cache_key, out)
|
| 195 |
+
return out, base_label + splice_label
|
iso_ne_zonal.py
ADDED
|
@@ -0,0 +1,239 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Real-time ISO-NE per-zone demand fetcher (no auth required).
|
| 3 |
+
|
| 4 |
+
Endpoint: https://www.iso-ne.com/transform/csv/fiveminuteestimatedzonalload
|
| 5 |
+
Returns 5-minute estimated load for all 8 ISO-NE zones; we roll up to
|
| 6 |
+
hourly (mean of 12 5-min observations) to match the model's input format.
|
| 7 |
+
|
| 8 |
+
Required trick: the endpoint returns HTTP 403 to direct curl, but accepts
|
| 9 |
+
the request once a session has visited a normal page first (cookie-prime
|
| 10 |
+
pattern, borrowed from the gridstatus.io library at
|
| 11 |
+
gridstatus/isone.py:_make_request).
|
| 12 |
+
|
| 13 |
+
Zone IDs (ISO-NE locational tags) -> our column names:
|
| 14 |
+
4001 .Z.MAINE -> ME
|
| 15 |
+
4002 .Z.NEWHAMPSHIRE -> NH
|
| 16 |
+
4003 .Z.VERMONT -> VT
|
| 17 |
+
4004 .Z.CONNECTICUT -> CT
|
| 18 |
+
4005 .Z.RHODEISLAND -> RI
|
| 19 |
+
4006 .Z.SEMASS -> SEMA
|
| 20 |
+
4007 .Z.WCMASS -> WCMA
|
| 21 |
+
4008 .Z.NEMASSBOST -> NEMA_BOST
|
| 22 |
+
|
| 23 |
+
Data publication delay is roughly 1 day: at 19:31 EDT today the CSV for
|
| 24 |
+
yesterday is fully populated; intra-day data may be missing recent hours
|
| 25 |
+
near the wall-clock present. The fetcher always asks for whole UTC days
|
| 26 |
+
and the caller is responsible for trimming to the desired range.
|
| 27 |
+
"""
|
| 28 |
+
|
| 29 |
+
from __future__ import annotations
|
| 30 |
+
|
| 31 |
+
import csv
|
| 32 |
+
import io
|
| 33 |
+
import logging
|
| 34 |
+
from datetime import datetime, timedelta, timezone
|
| 35 |
+
from typing import Optional
|
| 36 |
+
|
| 37 |
+
import numpy as np
|
| 38 |
+
import pandas as pd
|
| 39 |
+
import requests
|
| 40 |
+
|
| 41 |
+
ZONE_COLS = ["ME", "NH", "VT", "CT", "RI", "SEMA", "WCMA", "NEMA_BOST"]
|
| 42 |
+
|
| 43 |
+
# ISO-NE locational identifiers (zone IDs in the public CSV)
|
| 44 |
+
_ZONE_ID_TO_COL = {
|
| 45 |
+
4001: "ME",
|
| 46 |
+
4002: "NH",
|
| 47 |
+
4003: "VT",
|
| 48 |
+
4004: "CT",
|
| 49 |
+
4005: "RI",
|
| 50 |
+
4006: "SEMA",
|
| 51 |
+
4007: "WCMA",
|
| 52 |
+
4008: "NEMA_BOST",
|
| 53 |
+
}
|
| 54 |
+
|
| 55 |
+
_ZONAL_URL = "https://www.iso-ne.com/transform/csv/fiveminuteestimatedzonalload"
|
| 56 |
+
_PRIME_URL = "https://www.iso-ne.com/isoexpress/web/reports/operations/-/tree/gen-fuel-mix"
|
| 57 |
+
|
| 58 |
+
logger = logging.getLogger(__name__)
|
| 59 |
+
|
| 60 |
+
|
| 61 |
+
def _new_session() -> requests.Session:
|
| 62 |
+
"""Return a requests.Session that has cookies primed for ISO-NE."""
|
| 63 |
+
s = requests.Session()
|
| 64 |
+
s.headers.update({
|
| 65 |
+
"User-Agent": "Mozilla/5.0 (compatible; predict-power/1.0; "
|
| 66 |
+
"https://github.com/jeffliulab/real-time-power-predict)",
|
| 67 |
+
})
|
| 68 |
+
s.get(_PRIME_URL, timeout=10)
|
| 69 |
+
return s
|
| 70 |
+
|
| 71 |
+
|
| 72 |
+
def _parse_csv(text: str) -> pd.DataFrame:
|
| 73 |
+
"""Parse ISO-NE's quoted-CSV format (rows prefixed with C/H/D markers).
|
| 74 |
+
|
| 75 |
+
Returns a DataFrame with columns: timestamp_utc, zone_id, zone_name,
|
| 76 |
+
native_load_mw, btm_solar_mw.
|
| 77 |
+
"""
|
| 78 |
+
data_rows = [line for line in text.splitlines() if line.startswith('"D"')]
|
| 79 |
+
if not data_rows:
|
| 80 |
+
raise RuntimeError("ISO-NE CSV had no data rows")
|
| 81 |
+
parsed = list(csv.reader(data_rows, quotechar='"'))
|
| 82 |
+
df = pd.DataFrame(parsed, columns=[
|
| 83 |
+
"row_type", "datetime", "zone_id", "zone_name",
|
| 84 |
+
"native_load_mw", "btm_solar_mw",
|
| 85 |
+
])
|
| 86 |
+
# ISO-NE timestamps in the CSV are local time without TZ marker but
|
| 87 |
+
# are documented as Eastern Prevailing Time. Localize then convert.
|
| 88 |
+
ts_local = pd.to_datetime(df["datetime"]).dt.tz_localize(
|
| 89 |
+
"US/Eastern", nonexistent="shift_forward", ambiguous="infer",
|
| 90 |
+
)
|
| 91 |
+
df["timestamp_utc"] = ts_local.dt.tz_convert("UTC").dt.tz_localize(None)
|
| 92 |
+
df["zone_id"] = df["zone_id"].astype(int)
|
| 93 |
+
df["native_load_mw"] = df["native_load_mw"].astype(float)
|
| 94 |
+
return df[["timestamp_utc", "zone_id", "zone_name", "native_load_mw"]]
|
| 95 |
+
|
| 96 |
+
|
| 97 |
+
def fetch_one_day(date: datetime, session: Optional[requests.Session] = None,
|
| 98 |
+
timeout: int = 20) -> pd.DataFrame:
|
| 99 |
+
"""Fetch one calendar day of 5-minute per-zone estimated load.
|
| 100 |
+
|
| 101 |
+
Args:
|
| 102 |
+
date: any datetime; only the date portion (Eastern local) is used.
|
| 103 |
+
session: optional pre-primed session for batched fetches.
|
| 104 |
+
|
| 105 |
+
Returns:
|
| 106 |
+
Wide DataFrame indexed by timestamp_utc with one column per zone
|
| 107 |
+
(ME, NH, ..., NEMA_BOST), values in MWh-equivalent (5-min average MW
|
| 108 |
+
which when multiplied by 5/60 hours equals MWh; we keep MW units
|
| 109 |
+
and aggregate to hourly mean which numerically equals hourly MWh).
|
| 110 |
+
"""
|
| 111 |
+
own_session = session is None
|
| 112 |
+
if own_session:
|
| 113 |
+
session = _new_session()
|
| 114 |
+
date_str = date.strftime("%Y%m%d")
|
| 115 |
+
url = f"{_ZONAL_URL}?start={date_str}&end={date_str}"
|
| 116 |
+
r = session.get(url, timeout=timeout)
|
| 117 |
+
if r.status_code != 200:
|
| 118 |
+
raise RuntimeError(
|
| 119 |
+
f"ISO-NE zonal fetch failed: HTTP {r.status_code} for {url}")
|
| 120 |
+
if "text/csv" not in r.headers.get("Content-Type", "").lower():
|
| 121 |
+
raise RuntimeError(
|
| 122 |
+
f"ISO-NE zonal fetch returned non-CSV: {r.headers.get('Content-Type')}")
|
| 123 |
+
long_df = _parse_csv(r.content.decode("utf8"))
|
| 124 |
+
long_df["zone"] = long_df["zone_id"].map(_ZONE_ID_TO_COL)
|
| 125 |
+
if long_df["zone"].isna().any():
|
| 126 |
+
unknown = long_df.loc[long_df["zone"].isna(), "zone_id"].unique().tolist()
|
| 127 |
+
raise RuntimeError(f"Unknown zone IDs in ISO-NE response: {unknown}")
|
| 128 |
+
wide = long_df.pivot_table(
|
| 129 |
+
index="timestamp_utc", columns="zone", values="native_load_mw",
|
| 130 |
+
aggfunc="first")
|
| 131 |
+
wide = wide[ZONE_COLS] # canonical column order
|
| 132 |
+
wide.index.name = "timestamp_utc"
|
| 133 |
+
return wide
|
| 134 |
+
|
| 135 |
+
|
| 136 |
+
def fetch_range(start_date: datetime, end_date: datetime,
|
| 137 |
+
hourly: bool = True) -> pd.DataFrame:
|
| 138 |
+
"""Fetch 5-minute (or hourly-rolled) per-zone load over an inclusive
|
| 139 |
+
date range [start_date, end_date].
|
| 140 |
+
|
| 141 |
+
Args:
|
| 142 |
+
start_date / end_date: datetimes; only the date portion is used.
|
| 143 |
+
Both endpoints are inclusive.
|
| 144 |
+
hourly: if True (default), aggregate 12 5-min bins per hour to
|
| 145 |
+
the hourly mean (matches model input format). If False, return
|
| 146 |
+
the raw 5-minute resolution.
|
| 147 |
+
|
| 148 |
+
Returns:
|
| 149 |
+
DataFrame with timestamp_utc index and 8 zone columns.
|
| 150 |
+
"""
|
| 151 |
+
if start_date.tzinfo is not None:
|
| 152 |
+
start_date = start_date.astimezone(timezone.utc).replace(tzinfo=None)
|
| 153 |
+
if end_date.tzinfo is not None:
|
| 154 |
+
end_date = end_date.astimezone(timezone.utc).replace(tzinfo=None)
|
| 155 |
+
|
| 156 |
+
session = _new_session()
|
| 157 |
+
parts = []
|
| 158 |
+
cur = start_date.replace(hour=0, minute=0, second=0, microsecond=0)
|
| 159 |
+
end = end_date.replace(hour=0, minute=0, second=0, microsecond=0)
|
| 160 |
+
while cur <= end:
|
| 161 |
+
try:
|
| 162 |
+
parts.append(fetch_one_day(cur, session=session))
|
| 163 |
+
except Exception as e: # noqa: BLE001
|
| 164 |
+
logger.warning("ISO-NE fetch for %s failed: %s", cur.date(), e)
|
| 165 |
+
cur += timedelta(days=1)
|
| 166 |
+
|
| 167 |
+
if not parts:
|
| 168 |
+
raise RuntimeError(
|
| 169 |
+
f"ISO-NE fetch returned no data for range "
|
| 170 |
+
f"{start_date.date()} -> {end_date.date()}")
|
| 171 |
+
|
| 172 |
+
df = pd.concat(parts).sort_index()
|
| 173 |
+
df = df[~df.index.duplicated(keep="last")]
|
| 174 |
+
if not hourly:
|
| 175 |
+
return df
|
| 176 |
+
|
| 177 |
+
hourly_df = df.resample("1h").mean(numeric_only=True)
|
| 178 |
+
hourly_df = hourly_df[ZONE_COLS]
|
| 179 |
+
return hourly_df
|
| 180 |
+
|
| 181 |
+
|
| 182 |
+
def fetch_recent_hours(end_dt: datetime, hours: int = 24,
|
| 183 |
+
max_lookback_days: int = 3
|
| 184 |
+
) -> tuple[np.ndarray, datetime]:
|
| 185 |
+
"""Return ``(hours, 8)`` MW array of the most recent complete hours.
|
| 186 |
+
|
| 187 |
+
ISO-NE 5-min zonal data has ~1-2 hour publication lag. This helper
|
| 188 |
+
looks back from ``end_dt`` (rounded down to the hour) and finds the
|
| 189 |
+
latest contiguous window of ``hours`` complete hours of per-zone data
|
| 190 |
+
among the last ``max_lookback_days`` UTC dates.
|
| 191 |
+
|
| 192 |
+
Returns:
|
| 193 |
+
(array of shape (hours, 8) float32, latest_timestamp_in_window).
|
| 194 |
+
|
| 195 |
+
Raises RuntimeError if there isn't a contiguous ``hours``-window in
|
| 196 |
+
the last ``max_lookback_days``.
|
| 197 |
+
"""
|
| 198 |
+
if end_dt.tzinfo is None:
|
| 199 |
+
end_dt = end_dt.replace(tzinfo=timezone.utc)
|
| 200 |
+
end_dt = end_dt.astimezone(timezone.utc).replace(
|
| 201 |
+
minute=0, second=0, microsecond=0, tzinfo=None)
|
| 202 |
+
|
| 203 |
+
fetch_start = end_dt - timedelta(days=max_lookback_days)
|
| 204 |
+
df = fetch_range(fetch_start, end_dt, hourly=True)
|
| 205 |
+
df = df.dropna() # only fully-populated hours
|
| 206 |
+
if len(df) < hours:
|
| 207 |
+
raise RuntimeError(
|
| 208 |
+
f"ISO-NE has only {len(df)} complete hourly rows in the last "
|
| 209 |
+
f"{max_lookback_days} days; need {hours}.")
|
| 210 |
+
|
| 211 |
+
# Find the latest contiguous `hours`-length stretch (1-hour gaps allowed
|
| 212 |
+
# are NOT allowed here; we want strictly contiguous data).
|
| 213 |
+
df = df.sort_index()
|
| 214 |
+
contig_end = df.index[-1]
|
| 215 |
+
contig_start = contig_end - timedelta(hours=hours - 1)
|
| 216 |
+
window = df.loc[contig_start:contig_end]
|
| 217 |
+
if len(window) != hours:
|
| 218 |
+
raise RuntimeError(
|
| 219 |
+
f"ISO-NE: last {hours} hours not contiguous "
|
| 220 |
+
f"(got {len(window)} of {hours} expected, latest={contig_end}).")
|
| 221 |
+
return window[ZONE_COLS].to_numpy(dtype=np.float32), contig_end.to_pydatetime()
|
| 222 |
+
|
| 223 |
+
|
| 224 |
+
if __name__ == "__main__":
|
| 225 |
+
# Smoke test
|
| 226 |
+
logging.basicConfig(level=logging.INFO, format="%(message)s")
|
| 227 |
+
yesterday = (datetime.now(timezone.utc) - timedelta(days=1))
|
| 228 |
+
print(f"Fetching one day of ISO-NE per-zone load for "
|
| 229 |
+
f"{yesterday.date()} (UTC)...")
|
| 230 |
+
df = fetch_one_day(yesterday)
|
| 231 |
+
print(f" shape={df.shape}, columns={list(df.columns)}")
|
| 232 |
+
print(f" first row: {df.iloc[0].to_dict()}")
|
| 233 |
+
print()
|
| 234 |
+
print("Fetching last 24 contiguous hours...")
|
| 235 |
+
arr, latest = fetch_recent_hours(datetime.now(timezone.utc), hours=24)
|
| 236 |
+
print(f" shape={arr.shape}, latest_timestamp={latest}")
|
| 237 |
+
print(f" sum_at_t0={arr.sum(axis=1)[0]:.0f} MW")
|
| 238 |
+
print(f" zone means: "
|
| 239 |
+
+ ", ".join(f"{z}={arr[:, i].mean():.0f}" for i, z in enumerate(ZONE_COLS)))
|
model_utils.py
CHANGED
|
@@ -68,11 +68,22 @@ def denormalize_demand(z: np.ndarray, norm_stats: dict) -> np.ndarray:
|
|
| 68 |
return (z * std + mean).astype(np.float32)
|
| 69 |
|
| 70 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 71 |
def synthetic_weather_z(history_len: int = HISTORY_LEN,
|
| 72 |
future_len: int = FUTURE_LEN) -> np.ndarray:
|
| 73 |
"""Return a (S+24, H, W, C) array of zeros (training-mean weather
|
| 74 |
-
in z-score space).
|
| 75 |
-
|
|
|
|
| 76 |
return np.zeros((history_len + future_len, WEATHER_H, WEATHER_W, WEATHER_C),
|
| 77 |
dtype=np.float32)
|
| 78 |
|
|
@@ -83,20 +94,38 @@ def run_forecast(model: torch.nn.Module,
|
|
| 83 |
hist_cal: np.ndarray,
|
| 84 |
future_cal: np.ndarray,
|
| 85 |
norm_stats: dict,
|
|
|
|
|
|
|
| 86 |
device: str = "cpu") -> np.ndarray:
|
| 87 |
-
"""Run the baseline
|
| 88 |
|
| 89 |
Args:
|
| 90 |
-
hist_demand_mwh:
|
| 91 |
-
hist_cal:
|
| 92 |
-
future_cal:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 93 |
|
| 94 |
Returns:
|
| 95 |
(24, 8) forecast in MWh.
|
| 96 |
"""
|
| 97 |
-
|
| 98 |
-
|
| 99 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 100 |
|
| 101 |
hist_y_z = normalize_demand(hist_demand_mwh, norm_stats)
|
| 102 |
hist_y = torch.from_numpy(hist_y_z).unsqueeze(0).to(device)
|
|
|
|
| 68 |
return (z * std + mean).astype(np.float32)
|
| 69 |
|
| 70 |
|
| 71 |
+
def normalize_weather(raster: np.ndarray, norm_stats: dict) -> np.ndarray:
|
| 72 |
+
"""(T, H, W, 7) raw HRRR -> (T, H, W, 7) z-scored using training stats.
|
| 73 |
+
|
| 74 |
+
norm_stats stores per-channel mean/std as (1, 1, 1, 7) tensors.
|
| 75 |
+
"""
|
| 76 |
+
mean = norm_stats["weather_mean"].cpu().numpy().reshape(1, 1, 1, -1)
|
| 77 |
+
std = norm_stats["weather_std"].cpu().numpy().reshape(1, 1, 1, -1)
|
| 78 |
+
return ((raster - mean) / std).astype(np.float32)
|
| 79 |
+
|
| 80 |
+
|
| 81 |
def synthetic_weather_z(history_len: int = HISTORY_LEN,
|
| 82 |
future_len: int = FUTURE_LEN) -> np.ndarray:
|
| 83 |
"""Return a (S+24, H, W, C) array of zeros (training-mean weather
|
| 84 |
+
in z-score space). Kept as a fallback when the live HRRR fetcher
|
| 85 |
+
fails (e.g. no network, S3 outage); the model is degraded but still
|
| 86 |
+
produces calibrated output from demand + calendar."""
|
| 87 |
return np.zeros((history_len + future_len, WEATHER_H, WEATHER_W, WEATHER_C),
|
| 88 |
dtype=np.float32)
|
| 89 |
|
|
|
|
| 94 |
hist_cal: np.ndarray,
|
| 95 |
future_cal: np.ndarray,
|
| 96 |
norm_stats: dict,
|
| 97 |
+
hist_weather_raw: np.ndarray,
|
| 98 |
+
future_weather_raw: np.ndarray,
|
| 99 |
device: str = "cpu") -> np.ndarray:
|
| 100 |
+
"""Run the baseline forecast.
|
| 101 |
|
| 102 |
Args:
|
| 103 |
+
hist_demand_mwh: (24, 8) recent ISO-NE per-zone demand in MWh.
|
| 104 |
+
hist_cal: (24, 44) calendar features for the history window.
|
| 105 |
+
future_cal: (24, 44) calendar features for the next 24 h.
|
| 106 |
+
hist_weather_raw: (24, 450, 449, 7) RAW HRRR f00 analyses for the
|
| 107 |
+
history window. Will be z-scored internally.
|
| 108 |
+
future_weather_raw: (24, 450, 449, 7) RAW HRRR f01..f24 forecasts
|
| 109 |
+
(or analyses, if available) for the future
|
| 110 |
+
window. Will be z-scored internally.
|
| 111 |
|
| 112 |
Returns:
|
| 113 |
(24, 8) forecast in MWh.
|
| 114 |
"""
|
| 115 |
+
if hist_weather_raw.shape != (HISTORY_LEN, WEATHER_H, WEATHER_W, WEATHER_C):
|
| 116 |
+
raise ValueError(
|
| 117 |
+
f"hist_weather_raw shape {hist_weather_raw.shape} != "
|
| 118 |
+
f"({HISTORY_LEN}, {WEATHER_H}, {WEATHER_W}, {WEATHER_C})")
|
| 119 |
+
if future_weather_raw.shape != (FUTURE_LEN, WEATHER_H, WEATHER_W, WEATHER_C):
|
| 120 |
+
raise ValueError(
|
| 121 |
+
f"future_weather_raw shape {future_weather_raw.shape} != "
|
| 122 |
+
f"({FUTURE_LEN}, {WEATHER_H}, {WEATHER_W}, {WEATHER_C})")
|
| 123 |
+
|
| 124 |
+
hist_w_z = normalize_weather(hist_weather_raw, norm_stats)
|
| 125 |
+
fut_w_z = normalize_weather(future_weather_raw, norm_stats)
|
| 126 |
+
|
| 127 |
+
hist_w = torch.from_numpy(hist_w_z).unsqueeze(0).to(device)
|
| 128 |
+
fut_w = torch.from_numpy(fut_w_z).unsqueeze(0).to(device)
|
| 129 |
|
| 130 |
hist_y_z = normalize_demand(hist_demand_mwh, norm_stats)
|
| 131 |
hist_y = torch.from_numpy(hist_y_z).unsqueeze(0).to(device)
|
packages.txt
ADDED
|
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
|
|
|
| 1 |
+
libeccodes-dev
|
| 2 |
+
libeccodes-tools
|
requirements.txt
CHANGED
|
@@ -2,9 +2,16 @@ gradio>=5.30,<6
|
|
| 2 |
torch>=2.5,<3
|
| 3 |
numpy>=1.26,<3
|
| 4 |
pandas>=2.0
|
|
|
|
| 5 |
plotly>=5.18
|
| 6 |
requests>=2.31
|
| 7 |
-
|
| 8 |
-
#
|
| 9 |
-
# the
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 10 |
chronos-forecasting>=1.5,<2
|
|
|
|
| 2 |
torch>=2.5,<3
|
| 3 |
numpy>=1.26,<3
|
| 4 |
pandas>=2.0
|
| 5 |
+
scipy>=1.11
|
| 6 |
plotly>=5.18
|
| 7 |
requests>=2.31
|
| 8 |
+
|
| 9 |
+
# Real-time HRRR weather (fetched on-demand for live forecasts and as
|
| 10 |
+
# the input to the rolling backtest cache that ships from the data repo).
|
| 11 |
+
herbie-data>=2024.1
|
| 12 |
+
cfgrib>=0.9.10
|
| 13 |
+
xarray>=2024.0
|
| 14 |
+
eccodes>=2.40
|
| 15 |
+
|
| 16 |
+
# Chronos-Bolt foundation model for the per-zone ensemble path.
|
| 17 |
chronos-forecasting>=1.5,<2
|