KSvend Claude Opus 4.6 (1M context) commited on
Commit
804eb65
Β·
1 Parent(s): 52dc51f

Add EO product analytical upgrade design spec

Browse files

Covers native resolution, seasonal baselines, pixel-level change
detection, cross-indicator correlation, confidence model overhaul,
and report/visualization improvements for all 4 indicators.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

docs/superpowers/specs/2026-04-06-eo-product-overhaul-design.md ADDED
@@ -0,0 +1,301 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # EO Product Analytical Upgrade β€” Design Spec
2
+
3
+ **Date:** 2026-04-06
4
+ **Scope:** Improve all 4 EO indicators (NDVI, Water, SAR, Settlement) across data pipeline, analysis, and reporting.
5
+ **Approach:** Analytical Upgrade (Approach B from brainstorming)
6
+
7
+ ---
8
+
9
+ ## 1. OpenEO Data Pipeline Changes
10
+
11
+ ### Resolution Upgrade
12
+
13
+ | Indicator | Current | Target | Bands | Rationale |
14
+ |---|---|---|---|---|
15
+ | NDVI | 100m | 10m | B04 (10m), B08 (10m) | Both bands are native 10m |
16
+ | Water (MNDWI) | 100m | 20m | B03 (10m), B11 (20m) | Resample B03 down to 20m; upscaling B11 introduces artifacts |
17
+ | Settlement (NDBI) | 100m | 20m | B11 (20m), B08 (10m) | Same B11 constraint as Water |
18
+ | SAR Backscatter | 100m | 10m | VV, VH | Sentinel-1 GRD native ~10m |
19
+
20
+ Changes in `openeo_client.py`:
21
+ - Remove the blanket `resample_spatial(resolution=100)` step from each processing graph.
22
+ - NDVI graph: keep native 10m for both bands.
23
+ - Water/Settlement graphs: resample B03/B08 to 20m to match B11, compute index at 20m.
24
+ - SAR graph: keep native 10m.
25
+
26
+ ### Temporal Coverage Fix
27
+
28
+ The current period must deliver **12 monthly median composites** (one per calendar month in the analysis year). The baseline period must deliver **60 monthly median composites** (5 years x 12 months).
29
+
30
+ Root cause investigation: the 1-composite problem likely stems from how `aggregate_temporal_period("month")` interacts with the requested time range, or how bands are extracted in the harvest step. The fix must ensure:
31
+ - The openEO temporal aggregation produces one band per calendar month.
32
+ - The harvest step correctly identifies and labels each band by its calendar month.
33
+ - Missing months (due to cloud cover) are recorded as no-data, not silently dropped.
34
+
35
+ ### Cloud Fraction Tracking
36
+
37
+ For each monthly composite, track the **mean number of valid (cloud-free) observations** that contributed to the median.
38
+
39
+ Implementation: in the openEO processing graph, before aggregating to monthly median, count valid observations per pixel per month. Store the AOI-mean observation count per month as metadata alongside the composite values.
40
+
41
+ If this is not feasible within a single openEO batch job, compute it as a second lightweight job or estimate it from the SCL band statistics.
42
+
43
+ ---
44
+
45
+ ## 2. Seasonal Baselines
46
+
47
+ ### Monthly Baseline Statistics
48
+
49
+ For each of the 12 calendar months, compute from the 5-year baseline:
50
+ - **Median** (primary reference value)
51
+ - **Mean**
52
+ - **Standard deviation**
53
+ - **Min and max** (for envelope visualization)
54
+
55
+ These are computed **per pixel** for spatial analysis and **per AOI** (mean of pixel values) for time series charts and narrative.
56
+
57
+ ### Storage
58
+
59
+ Per-pixel seasonal stats are kept as in-memory numpy arrays during processing. AOI-level stats are stored in the `IndicatorResult` chart data structure.
60
+
61
+ The baseline GeoTIFF already contains 60 bands (monthly composites). The harvest step groups these by calendar month (bands 1, 13, 25, 37, 49 = all Januarys, etc.) and computes statistics across the 5 values per month.
62
+
63
+ ### Anomaly Scoring
64
+
65
+ For each current month with valid data:
66
+ ```
67
+ z_score = (current_value - baseline_month_mean) / baseline_month_std
68
+ ```
69
+
70
+ Where `baseline_month_std` is below a minimum threshold (indicator-specific), cap the z-score to avoid division-by-near-zero noise. Suggested minimum std thresholds:
71
+ - NDVI: 0.02
72
+ - Water fraction: 0.01
73
+ - SAR VV: 0.5 dB
74
+ - Settlement fraction: 0.01
75
+
76
+ A month is flagged as a **significant anomaly** when `|z_score| > 2.0`.
77
+
78
+ ---
79
+
80
+ ## 3. Pixel-Level Change Analysis
81
+
82
+ ### Change and Z-Score Rasters
83
+
84
+ For each indicator, for the most recent month with valid data:
85
+ - **Change raster:** `current_pixel - baseline_month_median_pixel`
86
+ - **Z-score raster:** `(current_pixel - baseline_month_mean_pixel) / max(baseline_month_std_pixel, min_std_threshold)`
87
+
88
+ These rasters are at native resolution (10m or 20m depending on indicator).
89
+
90
+ ### Hotspot Detection
91
+
92
+ Pixels where `|z_score| > 2.0` are classified as hotspots:
93
+ - Positive z-score: increase (e.g., vegetation gain, water increase, settlement growth)
94
+ - Negative z-score: decrease (e.g., vegetation loss, water retreat, settlement contraction)
95
+
96
+ Metrics derived from hotspots:
97
+ - **Percentage of AOI affected:** count of hotspot pixels / total valid pixels
98
+ - **Directional split:** % of AOI with significant increase vs significant decrease
99
+
100
+ ### Spatial Clustering
101
+
102
+ Apply connected-component labeling (scipy.ndimage.label) to the hotspot mask. For each cluster:
103
+ - Area in hectares
104
+ - Centroid (lat/lon)
105
+ - Mean z-score
106
+
107
+ Report the **top 3 clusters by area** in the narrative. This gives spatial specificity ("largest area of vegetation decline is a 45 ha patch centered at 2.14N, 37.84E").
108
+
109
+ Minimum cluster size filter: ignore clusters smaller than 4 pixels (0.16 ha at 20m, 0.04 ha at 10m) to reduce noise from edge effects and cloud mask boundaries.
110
+
111
+ ### Storage
112
+
113
+ Change and z-score rasters are held in memory for map rendering. Optionally included as GeoTIFFs in the data package for users who want to bring them into GIS.
114
+
115
+ ---
116
+
117
+ ## 4. Cross-Indicator Correlation
118
+
119
+ ### Compound Signal Definitions
120
+
121
+ After all 4 indicators are processed, test for these co-occurrence patterns:
122
+
123
+ | Signal | Trigger Conditions | Interpretation |
124
+ |---|---|---|
125
+ | **Land conversion** | NDVI decline hotspots overlap with Settlement growth hotspots (>10% pixel overlap in affected areas) | Possible agricultural/vegetation loss to urbanization |
126
+ | **Flood event** | SAR VV decrease (z < -2) + Water extent increase (z > 2) in same month | Potential flooding or waterlogging |
127
+ | **Drought stress** | NDVI decline (z < -2) + Water decline (z < -2) + SAR increase (z > 2, drying soil) | Possible drought conditions |
128
+ | **Displacement pressure** | Settlement growth hotspots with NDVI decline hotspots in adjacent/surrounding pixels | Expansion into previously vegetated land |
129
+
130
+ ### Implementation
131
+
132
+ - Spatial overlap tests: resample all indicator z-score rasters to the coarsest common resolution (20m) for pixel-level comparison.
133
+ - For each pattern, compute: whether it triggered (boolean), spatial overlap percentage, affected area in hectares.
134
+ - Flood and drought patterns also check temporal co-occurrence (same month).
135
+
136
+ ### Confidence Tagging
137
+
138
+ Each triggered signal gets a confidence note:
139
+ - **Strong:** 3+ indicators agree, >20% spatial overlap
140
+ - **Moderate:** 2 indicators agree, 10-20% overlap
141
+ - **Weak:** 2 indicators agree, <10% overlap (mentioned but flagged as tentative)
142
+
143
+ ### Caveats
144
+
145
+ - No causal claims. Language: "suggests possible X", not "confirms X".
146
+ - Rule-based only. No ML or AI interpretation at this stage.
147
+ - Absence of compound signals is reported explicitly.
148
+
149
+ ---
150
+
151
+ ## 5. Confidence Model
152
+
153
+ ### Four-Factor Scoring
154
+
155
+ Each indicator gets a composite confidence score from four factors, each scored 0.0 to 1.0:
156
+
157
+ | Factor | Weight | 0.25 | 0.5 | 0.75 | 1.0 |
158
+ |---|---|---|---|---|---|
159
+ | Temporal coverage | 0.30 | 0-3 valid months | 4-6 months | 7-9 months | 10-12 months |
160
+ | Observation density | 0.20 | <3 obs/composite | 3-5 obs | 6-10 obs | >10 obs |
161
+ | Baseline depth | 0.30 | <2 years with data | 2-3 years | 4 years | 5 years |
162
+ | Spatial completeness | 0.20 | <50% valid pixels | 50-75% | 75-90% | >90% |
163
+
164
+ ### Composite Score
165
+
166
+ ```
167
+ confidence_score = temporal * 0.3 + observation_density * 0.2 + baseline_depth * 0.3 + spatial_completeness * 0.2
168
+ ```
169
+
170
+ Mapped to levels:
171
+ - **LOW:** score < 0.4
172
+ - **MODERATE:** 0.4 <= score <= 0.7
173
+ - **HIGH:** score > 0.7
174
+
175
+ ### Data Requirements
176
+
177
+ - **Temporal coverage:** already available (count of non-empty bands in current GeoTIFF).
178
+ - **Observation density:** new, requires cloud fraction tracking from Section 1.
179
+ - **Baseline depth:** count how many of the 5 baseline years had valid data for each calendar month.
180
+ - **Spatial completeness:** compute from nodata mask of the current composite.
181
+
182
+ ---
183
+
184
+ ## 6. Report & Visualization Improvements
185
+
186
+ ### Time Series Charts (`charts.py`)
187
+
188
+ - Plot all 12 monthly data points (not 1).
189
+ - Seasonal baseline envelope: shaded min-max band that follows the calendar month curve.
190
+ - Baseline mean as dashed line tracking the seasonal shape.
191
+ - Anomalous months (|z| > 2.0) highlighted with a distinct marker (e.g., red ring).
192
+ - Y-axis with meaningful labels: "NDVI (0-1)", "Water extent (%)", "VV backscatter (dB)", "Built-up area (%)".
193
+ - X-axis: monthly labels (Jan 2025 β€” Dec 2025 or similar).
194
+
195
+ ### Spatial Maps (`maps.py`)
196
+
197
+ - Render at native resolution (10-20m).
198
+ - **Two maps per indicator:**
199
+ 1. Current state map (as now, but sharper).
200
+ 2. Change hotspot map β€” diverging colormap centered on zero (e.g., RdBu for NDVI: red = loss, blue = gain). Only pixels with |z| > 2.0 are colored; non-significant pixels are transparent over the true-color base.
201
+ - Colorbars with descriptive labels (e.g., "Vegetation loss <---> Vegetation gain").
202
+ - Consistent AOI boundary styling.
203
+
204
+ ### Narrative Fixes (`narrative.py`)
205
+
206
+ - Fix Settlement contradiction: align headline, trend label, and interpretation text to the actual sign and magnitude of change.
207
+ - Reference seasonal context: "March NDVI is 1.8 standard deviations below the March average" instead of "NDVI is -0.05 vs baseline".
208
+ - Use z-score language: "statistically significant decline" when |z| > 2, "within normal seasonal variation" otherwise.
209
+ - Compound signals integrated into situation assessment text.
210
+
211
+ ### Situation Assessment (page 1 of report)
212
+
213
+ - Summary table retains RED/AMBER/GREEN structure.
214
+ - Add column: **"Anomaly months"** β€” count of months with |z| > 2 per indicator.
215
+ - Compound signals listed below the table if any triggered.
216
+
217
+ ### Compound Signals Section (new, between Situation Assessment and Indicator Detail)
218
+
219
+ - Each triggered signal: short paragraph describing what was detected, which indicators contributed, spatial extent, and what it might mean.
220
+ - If no signals triggered: explicit statement.
221
+
222
+ ### Technical Annex
223
+
224
+ - Confidence breakdown table showing the 4 factors per indicator.
225
+ - Updated methodology text reflecting native resolution, seasonal baselines, z-score anomaly detection.
226
+ - Updated limitations reflecting the higher resolution and seasonal approach.
227
+
228
+ ---
229
+
230
+ ## 7. Status Classification Updates
231
+
232
+ The current thresholds are based on absolute change values. With seasonal baselines, status should be driven by z-scores:
233
+
234
+ | Status | Condition |
235
+ |---|---|
236
+ | **GREEN** | Current month z-score within [-1, 1] AND <10% of AOI has significant hotspots |
237
+ | **AMBER** | z-score in [-2, -1] or [1, 2] OR 10-25% of AOI has significant hotspots |
238
+ | **RED** | z-score beyond [-2, 2] OR >25% of AOI has significant hotspots |
239
+
240
+ Trend is determined by the direction of anomalies over the analysis period:
241
+ - **STABLE:** majority of months within [-1, 1]
242
+ - **IMPROVING:** trend of increasing z-scores (for NDVI/Water); for SAR and Settlement, direction depends on context β€” improvement means return toward baseline
243
+ - **DETERIORATING:** sustained deviation away from baseline in the concerning direction (NDVI/Water declining; SAR showing persistent anomaly in either direction)
244
+
245
+ Note: For SAR, both VV increase (drying) and VV decrease (flooding) can be concerning. Status is driven by |z-score| magnitude regardless of direction; the narrative text clarifies whether the change suggests wetting or drying.
246
+
247
+ ---
248
+
249
+ ## 8. Data Model Changes
250
+
251
+ ### `IndicatorResult` (models.py)
252
+
253
+ New fields:
254
+ - `anomaly_months: int` β€” count of months with |z| > 2
255
+ - `z_score_current: float` β€” z-score for the most recent valid month
256
+ - `hotspot_pct: float` β€” percentage of AOI with significant change
257
+ - `confidence_factors: dict` β€” the 4-factor breakdown (temporal, observation_density, baseline_depth, spatial_completeness)
258
+
259
+ ### `CompoundSignal` (new model)
260
+
261
+ ```python
262
+ class CompoundSignal:
263
+ name: str # e.g., "land_conversion"
264
+ triggered: bool
265
+ confidence: str # "strong", "moderate", "weak"
266
+ description: str # human-readable interpretation
267
+ indicators: list[str] # contributing indicator IDs
268
+ overlap_pct: float # spatial overlap percentage
269
+ affected_ha: float # affected area in hectares
270
+ ```
271
+
272
+ ### Chart data structure
273
+
274
+ Expanded to include:
275
+ - `baseline_monthly_min: list[float]` β€” 12 values, one per calendar month
276
+ - `baseline_monthly_max: list[float]`
277
+ - `baseline_monthly_mean: list[float]`
278
+ - `anomaly_flags: list[bool]` β€” True for months where |z| > 2
279
+
280
+ ---
281
+
282
+ ## 9. What Is NOT In Scope
283
+
284
+ - No new indicators (extensibility preserved but no additions now).
285
+ - No HTML/interactive report (PDF only, as now).
286
+ - No Claude-powered narrative generation (template-based, as now, with better templates).
287
+ - No changes to the job submission API or frontend.
288
+ - No changes to the email notification flow.
289
+ - No changes to the database schema (SQLite job table unchanged; new fields go in the JSON result blob).
290
+
291
+ ---
292
+
293
+ ## 10. Risk and Mitigation
294
+
295
+ | Risk | Impact | Mitigation |
296
+ |---|---|---|
297
+ | Native resolution makes openEO jobs much slower/larger | Processing time increases significantly | Async delivery (user gets email). Monitor job sizes and add AOI size limits if needed. |
298
+ | 60-band baseline GeoTIFFs are large | Memory pressure during harvest | Process bands in chunks; don't load all 60 bands simultaneously. |
299
+ | CDSE may not have 5 full years of Sentinel-1 data for all regions | Baseline depth factor scores low | Graceful degradation: reduce baseline years where data is unavailable, reflect in confidence score. |
300
+ | Pixel-level z-score maps are noisy at edges / cloud boundaries | False hotspots | Apply minimum cluster size filter (e.g., ignore clusters < 4 pixels). |
301
+ | Cross-indicator resampling introduces alignment artifacts | Inaccurate overlap calculations | Use nearest-neighbor resampling to common grid; accept minor alignment error at 20m. |