Spaces:

MERLx
/

Aperture

Sleeping

KSvend Claude Opus 4.6 (1M context) commited on 12 days ago

Commit

804eb65

1 Parent(s): 52dc51f

Add EO product analytical upgrade design spec

Covers native resolution, seasonal baselines, pixel-level change
detection, cross-indicator correlation, confidence model overhaul,
and report/visualization improvements for all 4 indicators.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Files changed (1) hide show

docs/superpowers/specs/2026-04-06-eo-product-overhaul-design.md +301 -0

docs/superpowers/specs/2026-04-06-eo-product-overhaul-design.md ADDED Viewed

	@@ -0,0 +1,301 @@

+# EO Product Analytical Upgrade — Design Spec
+**Date:** 2026-04-06
+**Scope:** Improve all 4 EO indicators (NDVI, Water, SAR, Settlement) across data pipeline, analysis, and reporting.
+**Approach:** Analytical Upgrade (Approach B from brainstorming)
+---
+## 1. OpenEO Data Pipeline Changes
+### Resolution Upgrade
+| Indicator | Current | Target | Bands | Rationale |
+|---|---|---|---|---|
+| NDVI | 100m | 10m | B04 (10m), B08 (10m) | Both bands are native 10m |
+| Water (MNDWI) | 100m | 20m | B03 (10m), B11 (20m) | Resample B03 down to 20m; upscaling B11 introduces artifacts |
+| Settlement (NDBI) | 100m | 20m | B11 (20m), B08 (10m) | Same B11 constraint as Water |
+| SAR Backscatter | 100m | 10m | VV, VH | Sentinel-1 GRD native ~10m |
+Changes in `openeo_client.py`:
+- Remove the blanket `resample_spatial(resolution=100)` step from each processing graph.
+- NDVI graph: keep native 10m for both bands.
+- Water/Settlement graphs: resample B03/B08 to 20m to match B11, compute index at 20m.
+- SAR graph: keep native 10m.
+### Temporal Coverage Fix
+The current period must deliver **12 monthly median composites** (one per calendar month in the analysis year). The baseline period must deliver **60 monthly median composites** (5 years x 12 months).
+Root cause investigation: the 1-composite problem likely stems from how `aggregate_temporal_period("month")` interacts with the requested time range, or how bands are extracted in the harvest step. The fix must ensure:
+- The openEO temporal aggregation produces one band per calendar month.
+- The harvest step correctly identifies and labels each band by its calendar month.
+- Missing months (due to cloud cover) are recorded as no-data, not silently dropped.
+### Cloud Fraction Tracking
+For each monthly composite, track the **mean number of valid (cloud-free) observations** that contributed to the median.
+Implementation: in the openEO processing graph, before aggregating to monthly median, count valid observations per pixel per month. Store the AOI-mean observation count per month as metadata alongside the composite values.
+If this is not feasible within a single openEO batch job, compute it as a second lightweight job or estimate it from the SCL band statistics.
+---
+## 2. Seasonal Baselines
+### Monthly Baseline Statistics
+For each of the 12 calendar months, compute from the 5-year baseline:
+- **Median** (primary reference value)
+- **Mean**
+- **Standard deviation**
+- **Min and max** (for envelope visualization)
+These are computed **per pixel** for spatial analysis and **per AOI** (mean of pixel values) for time series charts and narrative.
+### Storage
+Per-pixel seasonal stats are kept as in-memory numpy arrays during processing. AOI-level stats are stored in the `IndicatorResult` chart data structure.
+The baseline GeoTIFF already contains 60 bands (monthly composites). The harvest step groups these by calendar month (bands 1, 13, 25, 37, 49 = all Januarys, etc.) and computes statistics across the 5 values per month.
+### Anomaly Scoring
+For each current month with valid data:
+```
+z_score = (current_value - baseline_month_mean) / baseline_month_std
+```
+Where `baseline_month_std` is below a minimum threshold (indicator-specific), cap the z-score to avoid division-by-near-zero noise. Suggested minimum std thresholds:
+- NDVI: 0.02
+- Water fraction: 0.01
+- SAR VV: 0.5 dB
+- Settlement fraction: 0.01
+A month is flagged as a **significant anomaly** when `|z_score| > 2.0`.
+---
+## 3. Pixel-Level Change Analysis
+### Change and Z-Score Rasters
+For each indicator, for the most recent month with valid data:
+- **Change raster:** `current_pixel - baseline_month_median_pixel`
+- **Z-score raster:** `(current_pixel - baseline_month_mean_pixel) / max(baseline_month_std_pixel, min_std_threshold)`
+These rasters are at native resolution (10m or 20m depending on indicator).
+### Hotspot Detection
+Pixels where `|z_score| > 2.0` are classified as hotspots:
+- Positive z-score: increase (e.g., vegetation gain, water increase, settlement growth)
+- Negative z-score: decrease (e.g., vegetation loss, water retreat, settlement contraction)
+Metrics derived from hotspots:
+- **Percentage of AOI affected:** count of hotspot pixels / total valid pixels
+- **Directional split:** % of AOI with significant increase vs significant decrease
+### Spatial Clustering
+Apply connected-component labeling (scipy.ndimage.label) to the hotspot mask. For each cluster:
+- Area in hectares
+- Centroid (lat/lon)
+- Mean z-score
+Report the **top 3 clusters by area** in the narrative. This gives spatial specificity ("largest area of vegetation decline is a 45 ha patch centered at 2.14N, 37.84E").
+Minimum cluster size filter: ignore clusters smaller than 4 pixels (0.16 ha at 20m, 0.04 ha at 10m) to reduce noise from edge effects and cloud mask boundaries.
+### Storage
+Change and z-score rasters are held in memory for map rendering. Optionally included as GeoTIFFs in the data package for users who want to bring them into GIS.
+---
+## 4. Cross-Indicator Correlation
+### Compound Signal Definitions
+After all 4 indicators are processed, test for these co-occurrence patterns:
+| Signal | Trigger Conditions | Interpretation |
+|---|---|---|
+| **Land conversion** | NDVI decline hotspots overlap with Settlement growth hotspots (>10% pixel overlap in affected areas) | Possible agricultural/vegetation loss to urbanization |
+| **Flood event** | SAR VV decrease (z < -2) + Water extent increase (z > 2) in same month | Potential flooding or waterlogging |
+| **Drought stress** | NDVI decline (z < -2) + Water decline (z < -2) + SAR increase (z > 2, drying soil) | Possible drought conditions |
+| **Displacement pressure** | Settlement growth hotspots with NDVI decline hotspots in adjacent/surrounding pixels | Expansion into previously vegetated land |
+### Implementation
+- Spatial overlap tests: resample all indicator z-score rasters to the coarsest common resolution (20m) for pixel-level comparison.
+- For each pattern, compute: whether it triggered (boolean), spatial overlap percentage, affected area in hectares.
+- Flood and drought patterns also check temporal co-occurrence (same month).
+### Confidence Tagging
+Each triggered signal gets a confidence note:
+- **Strong:** 3+ indicators agree, >20% spatial overlap
+- **Moderate:** 2 indicators agree, 10-20% overlap
+- **Weak:** 2 indicators agree, <10% overlap (mentioned but flagged as tentative)
+### Caveats
+- No causal claims. Language: "suggests possible X", not "confirms X".
+- Rule-based only. No ML or AI interpretation at this stage.
+- Absence of compound signals is reported explicitly.
+---
+## 5. Confidence Model
+### Four-Factor Scoring
+Each indicator gets a composite confidence score from four factors, each scored 0.0 to 1.0:
+| Factor | Weight | 0.25 | 0.5 | 0.75 | 1.0 |
+|---|---|---|---|---|---|
+| Temporal coverage | 0.30 | 0-3 valid months | 4-6 months | 7-9 months | 10-12 months |
+| Observation density | 0.20 | <3 obs/composite | 3-5 obs | 6-10 obs | >10 obs |
+| Baseline depth | 0.30 | <2 years with data | 2-3 years | 4 years | 5 years |
+| Spatial completeness | 0.20 | <50% valid pixels | 50-75% | 75-90% | >90% |
+### Composite Score
+```
+confidence_score = temporal * 0.3 + observation_density * 0.2 + baseline_depth * 0.3 + spatial_completeness * 0.2
+```
+Mapped to levels:
+- **LOW:** score < 0.4
+- **MODERATE:** 0.4 <= score <= 0.7
+- **HIGH:** score > 0.7
+### Data Requirements
+- **Temporal coverage:** already available (count of non-empty bands in current GeoTIFF).
+- **Observation density:** new, requires cloud fraction tracking from Section 1.
+- **Baseline depth:** count how many of the 5 baseline years had valid data for each calendar month.
+- **Spatial completeness:** compute from nodata mask of the current composite.
+---
+## 6. Report & Visualization Improvements
+### Time Series Charts (`charts.py`)
+- Plot all 12 monthly data points (not 1).
+- Seasonal baseline envelope: shaded min-max band that follows the calendar month curve.
+- Baseline mean as dashed line tracking the seasonal shape.
+- Anomalous months (|z| > 2.0) highlighted with a distinct marker (e.g., red ring).
+- Y-axis with meaningful labels: "NDVI (0-1)", "Water extent (%)", "VV backscatter (dB)", "Built-up area (%)".
+- X-axis: monthly labels (Jan 2025 — Dec 2025 or similar).
+### Spatial Maps (`maps.py`)
+- Render at native resolution (10-20m).
+- **Two maps per indicator:**
+  1. Current state map (as now, but sharper).
+  2. Change hotspot map — diverging colormap centered on zero (e.g., RdBu for NDVI: red = loss, blue = gain). Only pixels with |z| > 2.0 are colored; non-significant pixels are transparent over the true-color base.
+- Colorbars with descriptive labels (e.g., "Vegetation loss <---> Vegetation gain").
+- Consistent AOI boundary styling.
+### Narrative Fixes (`narrative.py`)
+- Fix Settlement contradiction: align headline, trend label, and interpretation text to the actual sign and magnitude of change.
+- Reference seasonal context: "March NDVI is 1.8 standard deviations below the March average" instead of "NDVI is -0.05 vs baseline".
+- Use z-score language: "statistically significant decline" when |z| > 2, "within normal seasonal variation" otherwise.
+- Compound signals integrated into situation assessment text.
+### Situation Assessment (page 1 of report)
+- Summary table retains RED/AMBER/GREEN structure.
+- Add column: **"Anomaly months"** — count of months with |z| > 2 per indicator.
+- Compound signals listed below the table if any triggered.
+### Compound Signals Section (new, between Situation Assessment and Indicator Detail)
+- Each triggered signal: short paragraph describing what was detected, which indicators contributed, spatial extent, and what it might mean.
+- If no signals triggered: explicit statement.
+### Technical Annex
+- Confidence breakdown table showing the 4 factors per indicator.
+- Updated methodology text reflecting native resolution, seasonal baselines, z-score anomaly detection.
+- Updated limitations reflecting the higher resolution and seasonal approach.
+---
+## 7. Status Classification Updates
+The current thresholds are based on absolute change values. With seasonal baselines, status should be driven by z-scores:
+| Status | Condition |
+|---|---|
+| **GREEN** | Current month z-score within [-1, 1] AND <10% of AOI has significant hotspots |
+| **AMBER** | z-score in [-2, -1] or [1, 2] OR 10-25% of AOI has significant hotspots |
+| **RED** | z-score beyond [-2, 2] OR >25% of AOI has significant hotspots |
+Trend is determined by the direction of anomalies over the analysis period:
+- **STABLE:** majority of months within [-1, 1]
+- **IMPROVING:** trend of increasing z-scores (for NDVI/Water); for SAR and Settlement, direction depends on context — improvement means return toward baseline
+- **DETERIORATING:** sustained deviation away from baseline in the concerning direction (NDVI/Water declining; SAR showing persistent anomaly in either direction)
+Note: For SAR, both VV increase (drying) and VV decrease (flooding) can be concerning. Status is driven by |z-score| magnitude regardless of direction; the narrative text clarifies whether the change suggests wetting or drying.
+---
+## 8. Data Model Changes
+### `IndicatorResult` (models.py)
+New fields:
+- `anomaly_months: int` — count of months with |z| > 2
+- `z_score_current: float` — z-score for the most recent valid month
+- `hotspot_pct: float` — percentage of AOI with significant change
+- `confidence_factors: dict` — the 4-factor breakdown (temporal, observation_density, baseline_depth, spatial_completeness)
+### `CompoundSignal` (new model)
+```python
+class CompoundSignal:
+    name: str              # e.g., "land_conversion"
+    triggered: bool
+    confidence: str        # "strong", "moderate", "weak"
+    description: str       # human-readable interpretation
+    indicators: list[str]  # contributing indicator IDs
+    overlap_pct: float     # spatial overlap percentage
+    affected_ha: float     # affected area in hectares
+```
+### Chart data structure
+Expanded to include:
+- `baseline_monthly_min: list[float]` — 12 values, one per calendar month
+- `baseline_monthly_max: list[float]`
+- `baseline_monthly_mean: list[float]`
+- `anomaly_flags: list[bool]` — True for months where |z| > 2
+---
+## 9. What Is NOT In Scope
+- No new indicators (extensibility preserved but no additions now).
+- No HTML/interactive report (PDF only, as now).
+- No Claude-powered narrative generation (template-based, as now, with better templates).
+- No changes to the job submission API or frontend.
+- No changes to the email notification flow.
+- No changes to the database schema (SQLite job table unchanged; new fields go in the JSON result blob).
+---
+## 10. Risk and Mitigation
+| Risk | Impact | Mitigation |
+|---|---|---|
+| Native resolution makes openEO jobs much slower/larger | Processing time increases significantly | Async delivery (user gets email). Monitor job sizes and add AOI size limits if needed. |
+| 60-band baseline GeoTIFFs are large | Memory pressure during harvest | Process bands in chunks; don't load all 60 bands simultaneously. |
+| CDSE may not have 5 full years of Sentinel-1 data for all regions | Baseline depth factor scores low | Graceful degradation: reduce baseline years where data is unavailable, reflect in confidence score. |
+| Pixel-level z-score maps are noisy at edges / cloud boundaries | False hotspots | Apply minimum cluster size filter (e.g., ignore clusters < 4 pixels). |
+| Cross-indicator resampling introduces alignment artifacts | Inaccurate overlap calculations | Use nearest-neighbor resampling to common grid; accept minor alignment error at 20m. |