Spaces:

MERLx
/

Aperture

Sleeping

KSvend Claude Happy commited on Apr 1

Commit

f20ef5a

1 Parent(s): 9791720

docs: add openEO batch jobs implementation plan

9 tasks covering: submit_as_batch helper, BaseIndicator batch interface,
NDVI submit/harvest, three-phase worker, E2E tests, live verification,
cleanup, and baseline restoration.

Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>

Files changed (1) hide show

docs/superpowers/plans/2026-04-01-openeo-batch-jobs.md +1272 -0

docs/superpowers/plans/2026-04-01-openeo-batch-jobs.md ADDED Viewed

	@@ -0,0 +1,1272 @@

+# openEO Batch Job Processing — Implementation Plan
+> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
+**Goal:** Replace synchronous `cube.download()` with openEO batch jobs so NDVI (and later SAR, buildup, water) can process on CDSE free tier without hanging.
+**Architecture:** Three-phase worker (submit → poll → harvest). openEO indicators gain `submit_batch()` and `harvest()` methods; non-openEO indicators keep `process()`. All batch jobs run in parallel on CDSE during the poll phase.
+**Tech Stack:** openEO Python client (`openeo.rest.job.BatchJob`), asyncio, existing indicator/worker infrastructure.
+**Spec:** `docs/superpowers/specs/2026-04-01-openeo-batch-jobs-design.md`
+---
+### Task 1: Add `submit_as_batch()` helper to openeo_client.py
+**Files:**
+- Modify: `app/openeo_client.py` (add function after `_bbox_dict`)
+- Test: `tests/test_openeo_client.py` (add test at end)
+- [ ] **Step 1: Write the failing test**
+Add to `tests/test_openeo_client.py`:
+```python
+def test_submit_as_batch_creates_and_starts_job():
+    """submit_as_batch() creates a batch job and starts it."""
+    from app.openeo_client import submit_as_batch
+    mock_conn = MagicMock()
+    mock_cube = MagicMock()
+    mock_job = MagicMock()
+    mock_job.job_id = "j-12345"
+    mock_conn.create_job.return_value = mock_job
+    result = submit_as_batch(mock_conn, mock_cube, "ndvi-current-Test")
+    mock_conn.create_job.assert_called_once_with(mock_cube, title="ndvi-current-Test")
+    mock_job.start.assert_called_once()
+    assert result is mock_job
+```
+- [ ] **Step 2: Run test to verify it fails**
+Run: `pytest tests/test_openeo_client.py::test_submit_as_batch_creates_and_starts_job -v`
+Expected: FAIL with `ImportError: cannot import name 'submit_as_batch'`
+- [ ] **Step 3: Write the implementation**
+Add to `app/openeo_client.py` after the `_bbox_dict` function (after line 59):
+```python
+def submit_as_batch(
+    conn: openeo.Connection, cube: openeo.DataCube, title: str
+) -> "openeo.rest.job.BatchJob":
+    """Submit a datacube as a batch job on CDSE and start it."""
+    job = conn.create_job(cube, title=title)
+    job.start()
+    logger.info("Batch job %s started: %s", job.job_id, title)
+    return job
+```
+- [ ] **Step 4: Run test to verify it passes**
+Run: `pytest tests/test_openeo_client.py::test_submit_as_batch_creates_and_starts_job -v`
+Expected: PASS
+- [ ] **Step 5: Commit**
+```bash
+git add app/openeo_client.py tests/test_openeo_client.py
+git commit -m "feat: add submit_as_batch() helper for openEO batch jobs"
+```
+---
+### Task 2: Add batch interface to BaseIndicator
+**Files:**
+- Modify: `app/indicators/base.py` (add `uses_batch`, `submit_batch`, `harvest`)
+- Test: `tests/test_indicator_base.py` (new file)
+- [ ] **Step 1: Write the failing test**
+Create `tests/test_indicator_base.py`:
+```python
+"""Tests for BaseIndicator batch interface."""
+from __future__ import annotations
+import pytest
+from unittest.mock import MagicMock
+from app.indicators.base import BaseIndicator
+from app.models import AOI, TimeRange, IndicatorResult, StatusLevel, TrendDirection, ConfidenceLevel
+from datetime import date
+class PlainIndicator(BaseIndicator):
+    """Non-batch indicator for testing."""
+    id = "plain"
+    name = "Plain"
+    category = "T1"
+    question = "Test?"
+    estimated_minutes = 1
+    async def process(self, aoi, time_range, season_months=None):
+        return IndicatorResult(
+            indicator_id="plain", headline="ok",
+            status=StatusLevel.GREEN, trend=TrendDirection.STABLE,
+            confidence=ConfidenceLevel.HIGH, map_layer_path="",
+            chart_data={}, summary="", methodology="", limitations=[],
+        )
+class BatchIndicator(BaseIndicator):
+    """Batch indicator for testing."""
+    id = "batch"
+    name = "Batch"
+    category = "T2"
+    question = "Batch test?"
+    estimated_minutes = 5
+    uses_batch = True
+    async def process(self, aoi, time_range, season_months=None):
+        return IndicatorResult(
+            indicator_id="batch", headline="fallback",
+            status=StatusLevel.GREEN, trend=TrendDirection.STABLE,
+            confidence=ConfidenceLevel.LOW, map_layer_path="",
+            chart_data={}, data_source="placeholder",
+            summary="", methodology="", limitations=[],
+        )
+    async def submit_batch(self, aoi, time_range, season_months=None):
+        return [MagicMock()]
+    async def harvest(self, aoi, time_range, season_months=None, batch_jobs=None):
+        return IndicatorResult(
+            indicator_id="batch", headline="harvested",
+            status=StatusLevel.GREEN, trend=TrendDirection.STABLE,
+            confidence=ConfidenceLevel.HIGH, map_layer_path="",
+            chart_data={}, data_source="satellite",
+            summary="", methodology="", limitations=[],
+        )
+def test_plain_indicator_uses_batch_is_false():
+    ind = PlainIndicator()
+    assert ind.uses_batch is False
+def test_batch_indicator_uses_batch_is_true():
+    ind = BatchIndicator()
+    assert ind.uses_batch is True
+@pytest.mark.asyncio
+async def test_plain_indicator_submit_batch_raises():
+    ind = PlainIndicator()
+    with pytest.raises(NotImplementedError):
+        await ind.submit_batch(
+            AOI(name="T", bbox=[32, 15, 33, 16]),
+            TimeRange(start=date(2025, 1, 1), end=date(2025, 6, 30)),
+        )
+@pytest.mark.asyncio
+async def test_plain_indicator_harvest_raises():
+    ind = PlainIndicator()
+    with pytest.raises(NotImplementedError):
+        await ind.harvest(
+            AOI(name="T", bbox=[32, 15, 33, 16]),
+            TimeRange(start=date(2025, 1, 1), end=date(2025, 6, 30)),
+            batch_jobs=[],
+        )
+@pytest.mark.asyncio
+async def test_batch_indicator_submit_returns_jobs():
+    ind = BatchIndicator()
+    jobs = await ind.submit_batch(
+        AOI(name="T", bbox=[32, 15, 33, 16]),
+        TimeRange(start=date(2025, 1, 1), end=date(2025, 6, 30)),
+    )
+    assert len(jobs) == 1
+@pytest.mark.asyncio
+async def test_batch_indicator_harvest_returns_result():
+    ind = BatchIndicator()
+    result = await ind.harvest(
+        AOI(name="T", bbox=[32, 15, 33, 16]),
+        TimeRange(start=date(2025, 1, 1), end=date(2025, 6, 30)),
+        batch_jobs=[MagicMock()],
+    )
+    assert result.data_source == "satellite"
+    assert result.headline == "harvested"
+```
+- [ ] **Step 2: Run tests to verify they fail**
+Run: `pytest tests/test_indicator_base.py -v`
+Expected: FAIL — `uses_batch` attribute missing on `PlainIndicator`, `submit_batch` and `harvest` not on `BaseIndicator`
+- [ ] **Step 3: Write the implementation**
+Modify `app/indicators/base.py`. Add to the `BaseIndicator` class, after the `process` abstract method (after line 46):
+```python
+    uses_batch: bool = False
+    async def submit_batch(
+        self, aoi: AOI, time_range: TimeRange, season_months: list[int] | None = None
+    ) -> list:
+        """Submit openEO batch jobs. Override in batch indicators."""
+        raise NotImplementedError(f"{self.id} does not support batch processing")
+    async def harvest(
+        self, aoi: AOI, time_range: TimeRange, season_months: list[int] | None = None,
+        batch_jobs: list | None = None,
+    ) -> IndicatorResult:
+        """Download completed batch jobs and compute result. Override in batch indicators."""
+        raise NotImplementedError(f"{self.id} does not support batch harvesting")
+```
+- [ ] **Step 4: Run tests to verify they pass**
+Run: `pytest tests/test_indicator_base.py -v`
+Expected: All 6 tests PASS
+- [ ] **Step 5: Run full suite to check nothing broke**
+Run: `pytest tests/ -x -q`
+Expected: 145+ tests pass
+- [ ] **Step 6: Commit**
+```bash
+git add app/indicators/base.py tests/test_indicator_base.py
+git commit -m "feat: add batch job interface to BaseIndicator (uses_batch, submit_batch, harvest)"
+```
+---
+### Task 3: Implement NDVI `submit_batch()`
+**Files:**
+- Modify: `app/indicators/ndvi.py` (add `uses_batch`, `submit_batch`)
+- Test: `tests/test_indicator_ndvi.py` (add test)
+- [ ] **Step 1: Write the failing test**
+Add to `tests/test_indicator_ndvi.py`:
+```python
+@pytest.mark.asyncio
+async def test_ndvi_submit_batch_creates_three_jobs(test_aoi, test_time_range):
+    """submit_batch() creates current, baseline, and true-color batch jobs."""
+    from app.indicators.ndvi import NdviIndicator
+    indicator = NdviIndicator()
+    mock_conn = MagicMock()
+    mock_job = MagicMock()
+    mock_job.job_id = "j-test"
+    mock_conn.create_job.return_value = mock_job
+    with patch("app.indicators.ndvi.get_connection", return_value=mock_conn), \
+         patch("app.indicators.ndvi.build_ndvi_graph") as mock_ndvi_graph, \
+         patch("app.indicators.ndvi.build_true_color_graph") as mock_tc_graph:
+        mock_ndvi_graph.return_value = MagicMock()
+        mock_tc_graph.return_value = MagicMock()
+        jobs = await indicator.submit_batch(test_aoi, test_time_range)
+    assert len(jobs) == 3
+    assert mock_conn.create_job.call_count == 3
+    assert mock_job.start.call_count == 3
+    # Verify graph builders called with correct temporal extents
+    assert mock_ndvi_graph.call_count == 2  # current + baseline
+    assert mock_tc_graph.call_count == 1    # true-color
+```
+- [ ] **Step 2: Run test to verify it fails**
+Run: `pytest tests/test_indicator_ndvi.py::test_ndvi_submit_batch_creates_three_jobs -v`
+Expected: FAIL — `NdviIndicator` has no `submit_batch` override yet
+- [ ] **Step 3: Write the implementation**
+Add to `app/indicators/ndvi.py` in the `NdviIndicator` class. Add import at top of file:
+```python
+from app.openeo_client import get_connection, build_ndvi_graph, build_true_color_graph, _bbox_dict, submit_as_batch
+```
+Add class attribute and method after `_true_color_path` (after line 42):
+```python
+    uses_batch = True
+    async def submit_batch(
+        self, aoi: AOI, time_range: TimeRange, season_months: list[int] | None = None
+    ) -> list:
+        conn = get_connection()
+        bbox = _bbox_dict(aoi.bbox)
+        current_start = time_range.start.isoformat()
+        current_end = time_range.end.isoformat()
+        baseline_start = date(
+            time_range.start.year - BASELINE_YEARS,
+            time_range.start.month,
+            time_range.start.day,
+        ).isoformat()
+        baseline_end = date(
+            time_range.start.year,
+            time_range.start.month,
+            time_range.start.day,
+        ).isoformat()
+        current_cube = build_ndvi_graph(
+            conn=conn, bbox=bbox,
+            temporal_extent=[current_start, current_end],
+            resolution_m=RESOLUTION_M,
+        )
+        baseline_cube = build_ndvi_graph(
+            conn=conn, bbox=bbox,
+            temporal_extent=[baseline_start, baseline_end],
+            resolution_m=RESOLUTION_M,
+        )
+        true_color_cube = build_true_color_graph(
+            conn=conn, bbox=bbox,
+            temporal_extent=[current_start, current_end],
+            resolution_m=RESOLUTION_M,
+        )
+        return [
+            submit_as_batch(conn, current_cube, f"ndvi-current-{aoi.name}"),
+            submit_as_batch(conn, baseline_cube, f"ndvi-baseline-{aoi.name}"),
+            submit_as_batch(conn, true_color_cube, f"ndvi-truecolor-{aoi.name}"),
+        ]
+```
+- [ ] **Step 4: Run test to verify it passes**
+Run: `pytest tests/test_indicator_ndvi.py::test_ndvi_submit_batch_creates_three_jobs -v`
+Expected: PASS
+- [ ] **Step 5: Commit**
+```bash
+git add app/indicators/ndvi.py tests/test_indicator_ndvi.py
+git commit -m "feat: implement NdviIndicator.submit_batch() for openEO batch jobs"
+```
+---
+### Task 4: Implement NDVI `harvest()`
+**Files:**
+- Modify: `app/indicators/ndvi.py` (add `harvest`)
+- Test: `tests/test_indicator_ndvi.py` (add test)
+- [ ] **Step 1: Write the failing test**
+Add to `tests/test_indicator_ndvi.py`:
+```python
+@pytest.mark.asyncio
+async def test_ndvi_harvest_computes_result_from_batch_jobs(test_aoi, test_time_range):
+    """harvest() downloads batch results and returns IndicatorResult."""
+    from app.indicators.ndvi import NdviIndicator
+    indicator = NdviIndicator()
+    with tempfile.TemporaryDirectory() as tmpdir:
+        ndvi_path = os.path.join(tmpdir, "ndvi.tif")
+        rgb_path = os.path.join(tmpdir, "rgb.tif")
+        _mock_ndvi_tif(ndvi_path)
+        _mock_true_color_tif(rgb_path)
+        def make_mock_job(src_path):
+            job = MagicMock()
+            job.job_id = "j-test"
+            def fake_download_results(target):
+                import shutil
+                os.makedirs(target, exist_ok=True)
+                dest = os.path.join(target, "result.tif")
+                shutil.copy(src_path, dest)
+                from pathlib import Path
+                return {Path(dest): {"type": "image/tiff"}}
+            job.download_results.side_effect = fake_download_results
+            job.status.return_value = "finished"
+            return job
+        current_job = make_mock_job(ndvi_path)
+        baseline_job = make_mock_job(ndvi_path)
+        true_color_job = make_mock_job(rgb_path)
+        result = await indicator.harvest(
+            test_aoi, test_time_range,
+            batch_jobs=[current_job, baseline_job, true_color_job],
+        )
+    assert result.indicator_id == "ndvi"
+    assert result.data_source == "satellite"
+    assert result.status in (StatusLevel.GREEN, StatusLevel.AMBER, StatusLevel.RED)
+    assert result.confidence in (ConfidenceLevel.HIGH, ConfidenceLevel.MODERATE, ConfidenceLevel.LOW)
+    assert len(result.chart_data.get("dates", [])) > 0
+@pytest.mark.asyncio
+async def test_ndvi_harvest_degrades_when_baseline_fails(test_aoi, test_time_range):
+    """harvest() returns partial result when baseline job failed."""
+    from app.indicators.ndvi import NdviIndicator
+    indicator = NdviIndicator()
+    with tempfile.TemporaryDirectory() as tmpdir:
+        ndvi_path = os.path.join(tmpdir, "ndvi.tif")
+        rgb_path = os.path.join(tmpdir, "rgb.tif")
+        _mock_ndvi_tif(ndvi_path)
+        _mock_true_color_tif(rgb_path)
+        def make_mock_job(src_path, status="finished"):
+            job = MagicMock()
+            job.job_id = "j-test"
+            job.status.return_value = status
+            def fake_download_results(target):
+                if status == "error":
+                    raise Exception("Batch job failed on CDSE")
+                os.makedirs(target, exist_ok=True)
+                dest = os.path.join(target, "result.tif")
+                import shutil
+                shutil.copy(src_path, dest)
+                from pathlib import Path
+                return {Path(dest): {"type": "image/tiff"}}
+            job.download_results.side_effect = fake_download_results
+            return job
+        current_job = make_mock_job(ndvi_path)
+        baseline_job = make_mock_job(ndvi_path, status="error")
+        true_color_job = make_mock_job(rgb_path)
+        result = await indicator.harvest(
+            test_aoi, test_time_range,
+            batch_jobs=[current_job, baseline_job, true_color_job],
+        )
+    assert result.indicator_id == "ndvi"
+    assert result.data_source == "satellite"
+    assert result.confidence == ConfidenceLevel.LOW
+    assert result.trend == TrendDirection.STABLE
+@pytest.mark.asyncio
+async def test_ndvi_harvest_falls_back_when_current_fails(test_aoi, test_time_range):
+    """harvest() returns placeholder when current NDVI job failed."""
+    from app.indicators.ndvi import NdviIndicator
+    indicator = NdviIndicator()
+    current_job = MagicMock()
+    current_job.status.return_value = "error"
+    current_job.download_results.side_effect = Exception("failed")
+    baseline_job = MagicMock()
+    baseline_job.status.return_value = "finished"
+    true_color_job = MagicMock()
+    true_color_job.status.return_value = "finished"
+    result = await indicator.harvest(
+        test_aoi, test_time_range,
+        batch_jobs=[current_job, baseline_job, true_color_job],
+    )
+    assert result.data_source == "placeholder"
+```
+- [ ] **Step 2: Run tests to verify they fail**
+Run: `pytest tests/test_indicator_ndvi.py -k harvest -v`
+Expected: FAIL — `harvest` not implemented
+- [ ] **Step 3: Write the implementation**
+Add to `app/indicators/ndvi.py` in the `NdviIndicator` class, after `submit_batch`:
+```python
+    async def harvest(
+        self, aoi: AOI, time_range: TimeRange, season_months: list[int] | None = None,
+        batch_jobs: list | None = None,
+    ) -> IndicatorResult:
+        """Download completed batch job results and compute NDVI statistics."""
+        current_job, baseline_job, true_color_job = batch_jobs
+        results_dir = tempfile.mkdtemp(prefix="aperture_ndvi_batch_")
+        # Download current NDVI — required
+        try:
+            current_dir = os.path.join(results_dir, "current")
+            paths = current_job.download_results(current_dir)
+            current_path = self._find_tif(paths, current_dir)
+        except Exception as exc:
+            logger.warning("NDVI current batch download failed: %s", exc)
+            return self._fallback(aoi, time_range)
+        # Download baseline — optional (degrades gracefully)
+        baseline_path = None
+        try:
+            baseline_dir = os.path.join(results_dir, "baseline")
+            paths = baseline_job.download_results(baseline_dir)
+            baseline_path = self._find_tif(paths, baseline_dir)
+        except Exception as exc:
+            logger.warning("NDVI baseline batch download failed, degrading: %s", exc)
+        # Download true-color — optional
+        true_color_path = None
+        try:
+            tc_dir = os.path.join(results_dir, "truecolor")
+            paths = true_color_job.download_results(tc_dir)
+            true_color_path = self._find_tif(paths, tc_dir)
+        except Exception as exc:
+            logger.warning("NDVI true-color batch download failed: %s", exc)
+        # Compute statistics
+        current_stats = self._compute_stats(current_path)
+        current_mean = current_stats["overall_mean"]
+        if baseline_path:
+            baseline_stats = self._compute_stats(baseline_path)
+            baseline_mean = baseline_stats["overall_mean"]
+            change = current_mean - baseline_mean
+            confidence = (
+                ConfidenceLevel.HIGH if current_stats["valid_months"] >= 6
+                else ConfidenceLevel.MODERATE if current_stats["valid_months"] >= 3
+                else ConfidenceLevel.LOW
+            )
+            chart_data = self._build_chart_data(
+                current_stats["monthly_means"],
+                baseline_stats["monthly_means"],
+                time_range,
+            )
+        else:
+            baseline_mean = current_mean
+            change = 0.0
+            confidence = ConfidenceLevel.LOW
+            chart_data = {
+                "dates": [f"{time_range.end.year}-{m+1:02d}" for m in range(len(current_stats["monthly_means"]))],
+                "values": [round(v, 3) for v in current_stats["monthly_means"]],
+                "label": "NDVI",
+            }
+        status = self._classify(change)
+        trend = self._compute_trend(change) if baseline_path else TrendDirection.STABLE
+        if abs(change) <= 0.05:
+            headline = f"Vegetation stable (NDVI {current_mean:.2f}, Δ{change:+.2f} vs baseline)"
+        elif change > 0:
+            headline = f"Vegetation greening (NDVI +{change:.2f} vs baseline)"
+        else:
+            headline = f"Vegetation decline (NDVI {change:.2f} vs baseline)"
+        self._spatial_data = SpatialData(
+            map_type="raster", label="NDVI", colormap="RdYlGn",
+            vmin=-0.2, vmax=0.9,
+        )
+        self._indicator_raster_path = current_path
+        self._true_color_path = true_color_path
+        self._ndvi_peak_band = current_stats["peak_month_band"]
+        self._render_band = current_stats["peak_month_band"]
+        return IndicatorResult(
+            indicator_id=self.id,
+            headline=headline,
+            status=status,
+            trend=trend,
+            confidence=confidence,
+            map_layer_path=current_path,
+            chart_data=chart_data,
+            data_source="satellite",
+            summary=(
+                f"Mean NDVI is {current_mean:.3f} compared to a {BASELINE_YEARS}-year "
+                f"baseline of {baseline_mean:.3f} (Δ{change:+.3f}). "
+                f"Pixel-level analysis at {RESOLUTION_M}m resolution from "
+                f"{current_stats['valid_months']} monthly composites."
+            ),
+            methodology=(
+                f"Sentinel-2 L2A pixel-level NDVI = (B08 − B04) / (B08 + B04). "
+                f"Cloud-masked using SCL band (classes 4, 5, 6 retained). "
+                f"Monthly median composites at {RESOLUTION_M}m resolution. "
+                f"Baseline: {BASELINE_YEARS}-year monthly medians. "
+                f"Processed server-side via CDSE openEO batch jobs."
+            ),
+            limitations=[
+                f"Resampled to {RESOLUTION_M}m — sub-field variability not captured at this resolution.",
+                "Cloud cover reduces observation count in rainy seasons.",
+                "NDVI does not distinguish crop from natural vegetation.",
+                "Seasonal variation may mask long-term trends if analysis windows differ.",
+            ] + (["Baseline unavailable — change and trend not computed."] if not baseline_path else []),
+        )
+    @staticmethod
+    def _find_tif(download_paths: dict, fallback_dir: str) -> str:
+        """Find the GeoTIFF file from batch job download results."""
+        if download_paths:
+            for p in download_paths:
+                if str(p).endswith(".tif") or str(p).endswith(".tiff"):
+                    return str(p)
+        # Fallback: look for any .tif in the directory
+        for f in os.listdir(fallback_dir):
+            if f.endswith(".tif") or f.endswith(".tiff"):
+                return os.path.join(fallback_dir, f)
+        raise FileNotFoundError(f"No GeoTIFF found in {fallback_dir}")
+```
+- [ ] **Step 4: Run tests to verify they pass**
+Run: `pytest tests/test_indicator_ndvi.py -k harvest -v`
+Expected: All 3 harvest tests PASS
+- [ ] **Step 5: Run full suite**
+Run: `pytest tests/ -x -q`
+Expected: All tests pass (existing tests unaffected since `process()` is unchanged)
+- [ ] **Step 6: Commit**
+```bash
+git add app/indicators/ndvi.py tests/test_indicator_ndvi.py
+git commit -m "feat: implement NdviIndicator.harvest() with graceful degradation"
+```
+---
+### Task 5: Rewrite worker `process_job()` with three-phase flow
+**Files:**
+- Modify: `app/worker.py` (rewrite `process_job`)
+- Test: `tests/test_worker.py` (add batch worker test)
+- [ ] **Step 1: Write the failing test**
+Add to `tests/test_worker.py`:
+```python
+class MockBatchIndicator(BaseIndicator):
+    """Batch indicator for testing the three-phase worker."""
+    id = "ndvi"
+    name = "Vegetation (NDVI)"
+    category = "D2"
+    question = "Is vegetation cover declining?"
+    estimated_minutes = 8
+    uses_batch = True
+    async def process(self, aoi, time_range, season_months=None):
+        return IndicatorResult(
+            indicator_id="ndvi", headline="placeholder",
+            status=StatusLevel.GREEN, trend=TrendDirection.STABLE,
+            confidence=ConfidenceLevel.LOW, map_layer_path="",
+            chart_data={"dates": ["2025"], "values": [0.3], "label": "NDVI"},
+            data_source="placeholder",
+            summary="Fallback.", methodology="Placeholder.", limitations=[],
+        )
+    async def submit_batch(self, aoi, time_range, season_months=None):
+        mock_job = MagicMock()
+        mock_job.job_id = "j-test"
+        mock_job.status.return_value = "finished"
+        return [mock_job, mock_job, mock_job]
+    async def harvest(self, aoi, time_range, season_months=None, batch_jobs=None):
+        return IndicatorResult(
+            indicator_id="ndvi", headline="Real NDVI data",
+            status=StatusLevel.GREEN, trend=TrendDirection.STABLE,
+            confidence=ConfidenceLevel.HIGH, map_layer_path="",
+            chart_data={"dates": ["2025-01"], "values": [0.45], "label": "NDVI"},
+            data_source="satellite",
+            summary="Real.", methodology="Sentinel-2.", limitations=[],
+        )
+@pytest.mark.asyncio
+async def test_process_job_uses_batch_flow(temp_db_path):
+    """Worker uses submit_batch → poll → harvest for batch indicators."""
+    db = Database(temp_db_path)
+    await db.init()
+    reg = IndicatorRegistry()
+    reg.register(MockBatchIndicator())
+    request = JobRequest(
+        aoi=AOI(name="Test", bbox=[32.45, 15.65, 32.65, 15.80]),
+        time_range=TimeRange(start=date(2025, 3, 1), end=date(2026, 3, 1)),
+        indicator_ids=["ndvi"],
+        email="test@example.com",
+    )
+    job_id = await db.create_job(request)
+    await process_job(job_id, db, reg)
+    job = await db.get_job(job_id)
+    assert job.status == JobStatus.COMPLETE
+    assert len(job.results) == 1
+    assert job.results[0].data_source == "satellite"
+    assert job.results[0].headline == "Real NDVI data"
+@pytest.mark.asyncio
+async def test_process_job_mixes_batch_and_process(temp_db_path):
+    """Worker handles batch and non-batch indicators in the same job."""
+    db = Database(temp_db_path)
+    await db.init()
+    reg = IndicatorRegistry()
+    reg.register(MockBatchIndicator())
+    reg.register(MockFiresIndicator())
+    request = JobRequest(
+        aoi=AOI(name="Test", bbox=[32.45, 15.65, 32.65, 15.80]),
+        time_range=TimeRange(start=date(2025, 3, 1), end=date(2026, 3, 1)),
+        indicator_ids=["ndvi", "fires"],
+        email="test@example.com",
+    )
+    job_id = await db.create_job(request)
+    await process_job(job_id, db, reg)
+    job = await db.get_job(job_id)
+    assert job.status == JobStatus.COMPLETE
+    assert len(job.results) == 2
+    ndvi_result = next(r for r in job.results if r.indicator_id == "ndvi")
+    fires_result = next(r for r in job.results if r.indicator_id == "fires")
+    assert ndvi_result.data_source == "satellite"
+    assert fires_result.headline == "3 fire events detected"
+@pytest.mark.asyncio
+async def test_process_job_batch_submit_failure_falls_back(temp_db_path):
+    """If submit_batch() fails, worker falls back to process()."""
+    class FailingBatchIndicator(MockBatchIndicator):
+        async def submit_batch(self, aoi, time_range, season_months=None):
+            raise ConnectionError("CDSE unreachable")
+    db = Database(temp_db_path)
+    await db.init()
+    reg = IndicatorRegistry()
+    reg.register(FailingBatchIndicator())
+    request = JobRequest(
+        aoi=AOI(name="Test", bbox=[32.45, 15.65, 32.65, 15.80]),
+        time_range=TimeRange(start=date(2025, 3, 1), end=date(2026, 3, 1)),
+        indicator_ids=["ndvi"],
+        email="test@example.com",
+    )
+    job_id = await db.create_job(request)
+    await process_job(job_id, db, reg)
+    job = await db.get_job(job_id)
+    assert job.status == JobStatus.COMPLETE
+    assert job.results[0].data_source == "placeholder"
+```
+- [ ] **Step 2: Run tests to verify they fail**
+Run: `pytest tests/test_worker.py -k batch -v`
+Expected: FAIL — current `process_job` doesn't call `submit_batch` or `harvest`
+- [ ] **Step 3: Write the implementation**
+Replace `process_job` in `app/worker.py` (lines 47-207):
+```python
+BATCH_POLL_INTERVAL = 30  # seconds between status checks
+BATCH_TIMEOUT = 1200  # 20 minutes maximum wait
+async def process_job(job_id: str, db: Database, registry: IndicatorRegistry) -> None:
+    job = await db.get_job(job_id)
+    if job is None:
+        logger.error(f"Job {job_id} not found")
+        return
+    await db.update_job_status(job_id, JobStatus.PROCESSING)
+    try:
+        spatial_cache = {}
+        # Separate batch vs non-batch indicators
+        batch_indicators = {}
+        process_indicators = []
+        for indicator_id in job.request.indicator_ids:
+            indicator = registry.get(indicator_id)
+            if indicator.uses_batch:
+                batch_indicators[indicator_id] = indicator
+            else:
+                process_indicators.append((indicator_id, indicator))
+        # ── Phase 1: Submit batch jobs ──
+        batch_submissions = {}  # {indicator_id: list[BatchJob]}
+        fallback_ids = set()    # indicators that failed to submit
+        for indicator_id, indicator in batch_indicators.items():
+            await db.update_job_progress(job_id, indicator_id, "submitting")
+            try:
+                jobs = await indicator.submit_batch(
+                    job.request.aoi,
+                    job.request.time_range,
+                    season_months=job.request.season_months(),
+                )
+                batch_submissions[indicator_id] = jobs
+                await db.update_job_progress(job_id, indicator_id, "processing on CDSE")
+            except Exception as exc:
+                logger.warning("Batch submit failed for %s, will use fallback: %s", indicator_id, exc)
+                fallback_ids.add(indicator_id)
+        # ── Phase 2: Poll until all batch jobs finish ──
+        import time
+        poll_start = time.monotonic()
+        pending = dict(batch_submissions)
+        while pending:
+            elapsed = time.monotonic() - poll_start
+            if elapsed >= BATCH_TIMEOUT:
+                logger.warning("Batch poll timeout after %.0fs, remaining: %s", elapsed, list(pending.keys()))
+                fallback_ids.update(pending.keys())
+                break
+            await asyncio.sleep(BATCH_POLL_INTERVAL)
+            for indicator_id in list(pending.keys()):
+                jobs = pending[indicator_id]
+                statuses = [j.status() for j in jobs]
+                if all(s == "finished" for s in statuses):
+                    logger.info("Batch jobs finished for %s", indicator_id)
+                    del pending[indicator_id]
+                elif any(s in ("error", "canceled") for s in statuses):
+                    logger.warning("Batch job failed for %s: %s", indicator_id, statuses)
+                    del pending[indicator_id]
+                    # Don't add to fallback — harvest() handles partial failure
+        # ── Phase 3: Harvest batch results + process non-batch indicators ──
+        for indicator_id in job.request.indicator_ids:
+            indicator = registry.get(indicator_id)
+            if indicator_id in fallback_ids:
+                # Submit failed or timed out — use process() fallback
+                await db.update_job_progress(job_id, indicator_id, "processing")
+                result = await indicator.process(
+                    job.request.aoi,
+                    job.request.time_range,
+                    season_months=job.request.season_months(),
+                )
+            elif indicator_id in batch_submissions:
+                # Harvest batch results
+                await db.update_job_progress(job_id, indicator_id, "downloading")
+                try:
+                    result = await indicator.harvest(
+                        job.request.aoi,
+                        job.request.time_range,
+                        season_months=job.request.season_months(),
+                        batch_jobs=batch_submissions[indicator_id],
+                    )
+                except Exception as exc:
+                    logger.warning("Harvest failed for %s, using fallback: %s", indicator_id, exc)
+                    result = await indicator.process(
+                        job.request.aoi,
+                        job.request.time_range,
+                        season_months=job.request.season_months(),
+                    )
+            else:
+                # Non-batch indicator — use process() directly
+                await db.update_job_progress(job_id, indicator_id, "processing")
+                result = await indicator.process(
+                    job.request.aoi,
+                    job.request.time_range,
+                    season_months=job.request.season_months(),
+                )
+            spatial = indicator.get_spatial_data()
+            if spatial is not None:
+                spatial_cache[indicator_id] = spatial
+            await db.save_job_result(job_id, result)
+            await db.update_job_progress(job_id, indicator_id, "complete")
+        # ── Generate outputs (unchanged) ──
+        job = await db.get_job(job_id)
+        results_dir = os.path.join("results", job_id)
+        os.makedirs(results_dir, exist_ok=True)
+        output_files = []
+        for result in job.results:
+            chart_path = os.path.join(results_dir, f"{result.indicator_id}_chart.png")
+            render_timeseries_chart(
+                chart_data=result.chart_data,
+                indicator_name=_indicator_label(result.indicator_id),
+                status=result.status,
+                trend=result.trend,
+                output_path=chart_path,
+            )
+            output_files.append(chart_path)
+            spatial = spatial_cache.get(result.indicator_id)
+            map_path = os.path.join(results_dir, f"{result.indicator_id}_map.png")
+            if spatial is not None and spatial.map_type == "raster":
+                indicator_obj = registry.get(result.indicator_id)
+                raster_path = getattr(indicator_obj, '_indicator_raster_path', None)
+                true_color_path = getattr(indicator_obj, '_true_color_path', None)
+                render_band = getattr(indicator_obj, '_render_band', 1)
+                from app.outputs.maps import render_raster_map
+                render_raster_map(
+                    true_color_path=true_color_path,
+                    indicator_path=raster_path,
+                    indicator_band=render_band,
+                    aoi=job.request.aoi,
+                    status=result.status,
+                    output_path=map_path,
+                    cmap=spatial.colormap,
+                    vmin=spatial.vmin,
+                    vmax=spatial.vmax,
+                    label=spatial.label,
+                )
+            elif spatial is not None:
+                render_indicator_map(
+                    spatial=spatial,
+                    aoi=job.request.aoi,
+                    status=result.status,
+                    output_path=map_path,
+                )
+            else:
+                render_status_map(
+                    aoi=job.request.aoi,
+                    status=result.status,
+                    output_path=map_path,
+                )
+            output_files.append(map_path)
+            spatial_json_path = os.path.join(results_dir, f"{result.indicator_id}_spatial.json")
+            _save_spatial_json(spatial, result.status.value, spatial_json_path)
+        indicator_map_paths = {}
+        for result in job.results:
+            mp = os.path.join(results_dir, f"{result.indicator_id}_map.png")
+            if os.path.exists(mp):
+                indicator_map_paths[result.indicator_id] = mp
+        from app.models import StatusLevel
+        worst_status = max(
+            (r.status for r in job.results),
+            key=lambda s: [StatusLevel.GREEN, StatusLevel.AMBER, StatusLevel.RED].index(s),
+        )
+        summary_map_path = os.path.join(results_dir, "summary_map.png")
+        render_status_map(aoi=job.request.aoi, status=worst_status, output_path=summary_map_path)
+        output_files.append(summary_map_path)
+        overview_score = compute_composite_score(job.results)
+        overview_score_path = os.path.join(results_dir, "overview_score.json")
+        write_overview_score(overview_score, overview_score_path)
+        output_files.append(overview_score_path)
+        overview_map_path = os.path.join(results_dir, "overview_map.png")
+        true_color_path = None
+        for ind_id in job.request.indicator_ids:
+            ind_obj = registry.get(ind_id)
+            tc = getattr(ind_obj, '_true_color_path', None)
+            if tc and os.path.exists(tc):
+                true_color_path = tc
+                break
+        if true_color_path:
+            render_overview_map(
+                true_color_path=true_color_path,
+                aoi=job.request.aoi,
+                output_path=overview_map_path,
+                title=f"{job.request.aoi.name} — Satellite Overview",
+                date_range=f"{job.request.time_range.start} to {job.request.time_range.end}",
+            )
+            output_files.append(overview_map_path)
+        report_path = os.path.join(results_dir, "report.pdf")
+        generate_pdf_report(
+            aoi=job.request.aoi,
+            time_range=job.request.time_range,
+            results=job.results,
+            output_path=report_path,
+            summary_map_path=summary_map_path,
+            indicator_map_paths=indicator_map_paths,
+            overview_score=overview_score,
+            overview_map_path=overview_map_path if true_color_path else "",
+        )
+        output_files.append(report_path)
+        package_path = os.path.join(results_dir, "package.zip")
+        create_data_package(files=output_files, output_path=package_path)
+        await db.update_job_status(job_id, JobStatus.COMPLETE)
+        await send_completion_email(
+            to_email=job.request.email,
+            job_id=job_id,
+            aoi_name=job.request.aoi.name,
+        )
+    except Exception as e:
+        logger.exception(f"Job {job_id} failed: {e}")
+        await db.update_job_status(job_id, JobStatus.FAILED, error=str(e))
+```
+Also add `import time` to the top imports if not already present (it's used inside the function but importing at module level is cleaner).
+- [ ] **Step 4: Run batch worker tests**
+Run: `pytest tests/test_worker.py -v`
+Expected: All tests PASS (existing + 3 new batch tests)
+- [ ] **Step 5: Run full suite**
+Run: `pytest tests/ -x -q`
+Expected: All tests pass
+- [ ] **Step 6: Commit**
+```bash
+git add app/worker.py tests/test_worker.py
+git commit -m "feat: three-phase batch job worker (submit → poll → harvest)"
+```
+---
+### Task 6: Update E2E test for batch flow
+**Files:**
+- Modify: `tests/test_ndvi_e2e.py` (add batch E2E test)
+- [ ] **Step 1: Add batch E2E test**
+Add to `tests/test_ndvi_e2e.py`:
+```python
+@pytest.mark.asyncio
+async def test_ndvi_batch_pipeline():
+    """Batch pipeline: submit → harvest → render map → render chart."""
+    from app.indicators.ndvi import NdviIndicator
+    import shutil
+    aoi = AOI(name="Khartoum Batch", bbox=BBOX)
+    time_range = TimeRange(start=date(2025, 3, 1), end=date(2026, 3, 1))
+    with tempfile.TemporaryDirectory() as tmpdir:
+        ndvi_path = os.path.join(tmpdir, "ndvi.tif")
+        rgb_path = os.path.join(tmpdir, "rgb.tif")
+        _write_ndvi_tif(ndvi_path)
+        _write_rgb_tif(rgb_path)
+        mock_conn = MagicMock()
+        def make_mock_job(src_path):
+            job = MagicMock()
+            job.job_id = "j-e2e"
+            job.status.return_value = "finished"
+            def fake_download_results(target):
+                os.makedirs(target, exist_ok=True)
+                dest = os.path.join(target, "result.tif")
+                shutil.copy(src_path, dest)
+                from pathlib import Path
+                return {Path(dest): {"type": "image/tiff"}}
+            job.download_results.side_effect = fake_download_results
+            return job
+        mock_ndvi_job = make_mock_job(ndvi_path)
+        mock_tc_job = make_mock_job(rgb_path)
+        mock_conn.create_job.return_value = mock_ndvi_job
+        with patch("app.indicators.ndvi.get_connection", return_value=mock_conn), \
+             patch("app.indicators.ndvi.build_ndvi_graph", return_value=MagicMock()), \
+             patch("app.indicators.ndvi.build_true_color_graph", return_value=MagicMock()), \
+             patch("app.indicators.ndvi.submit_as_batch") as mock_submit:
+            mock_submit.side_effect = [
+                make_mock_job(ndvi_path),   # current
+                make_mock_job(ndvi_path),   # baseline
+                make_mock_job(rgb_path),    # true-color
+            ]
+            indicator = NdviIndicator()
+            # Phase 1: submit
+            jobs = await indicator.submit_batch(aoi, time_range)
+            assert len(jobs) == 3
+            # Phase 3: harvest
+            result = await indicator.harvest(aoi, time_range, batch_jobs=jobs)
+        assert result.indicator_id == "ndvi"
+        assert result.data_source == "satellite"
+        assert len(result.chart_data["dates"]) >= 6
+        # Render the raster map
+        map_out = os.path.join(tmpdir, "ndvi_map.png")
+        raster_path = indicator._indicator_raster_path
+        tc_path = indicator._true_color_path
+        peak = indicator._ndvi_peak_band
+        render_raster_map(
+            true_color_path=tc_path,
+            indicator_path=raster_path,
+            indicator_band=peak,
+            aoi=aoi,
+            status=result.status,
+            output_path=map_out,
+            cmap="RdYlGn",
+            vmin=-0.2,
+            vmax=0.9,
+            label="NDVI",
+        )
+        assert os.path.exists(map_out)
+        assert os.path.getsize(map_out) > 10000
+        # Render the chart
+        chart_out = os.path.join(tmpdir, "ndvi_chart.png")
+        render_timeseries_chart(
+            chart_data=result.chart_data,
+            indicator_name="Vegetation (NDVI)",
+            status=result.status,
+            trend=result.trend,
+            output_path=chart_out,
+            y_label="NDVI",
+        )
+        assert os.path.exists(chart_out)
+        assert os.path.getsize(chart_out) > 5000
+```
+- [ ] **Step 2: Run test**
+Run: `pytest tests/test_ndvi_e2e.py::test_ndvi_batch_pipeline -v`
+Expected: PASS
+- [ ] **Step 3: Run full suite**
+Run: `pytest tests/ -x -q`
+Expected: All tests pass
+- [ ] **Step 4: Commit**
+```bash
+git add tests/test_ndvi_e2e.py
+git commit -m "test: add batch flow E2E test for NDVI pipeline"
+```
+---
+### Task 7: Deploy and verify on HF Space
+**Files:**
+- No code changes — deployment and live verification
+- [ ] **Step 1: Push to both remotes**
+```bash
+git push origin main && git push hf main
+```
+- [ ] **Step 2: Wait for Space rebuild**
+Poll until the Space is running with the new SHA:
+```bash
+python3 -c "
+from huggingface_hub import HfApi
+api = HfApi()
+rt = api.get_space_runtime('MERLx/Aperture')
+print('Stage:', rt.stage, 'SHA:', rt.raw.get('sha', 'unknown'))
+"
+```
+Wait until `Stage: RUNNING` with the latest commit SHA.
+- [ ] **Step 3: Submit NDVI test job**
+```bash
+# Get auth token
+curl -s -X POST https://merlx-aperture.hf.space/api/auth/request \
+  -H 'Content-Type: application/json' -d '{"email":"test@aperture.dev"}'
+# Submit job (use the demo_token from response)
+AUTH="Bearer test@aperture.dev:<token>"
+curl -s -X POST https://merlx-aperture.hf.space/api/jobs \
+  -H "Content-Type: application/json" \
+  -H "Authorization: $AUTH" \
+  -d '{
+    "aoi": {"name": "Khartoum Batch Test", "bbox": [32.52, 15.58, 32.57, 15.63]},
+    "time_range": {"start": "2025-01-01", "end": "2025-06-30"},
+    "indicator_ids": ["ndvi"],
+    "email": "test@aperture.dev"
+  }'
+```
+- [ ] **Step 4: Poll until complete**
+```bash
+curl -s https://merlx-aperture.hf.space/api/jobs/<job_id> \
+  -H "Authorization: $AUTH" | python3 -m json.tool
+```
+Poll every 2 minutes. Check progress states: `submitting` → `processing on CDSE` → `downloading` → `complete`.
+- [ ] **Step 5: Verify real data**
+Check the response for:
+- `data_source` is `"satellite"` (not `"placeholder"`)
+- `confidence` is `"high"` or `"moderate"` (not `"low"`)
+- `methodology` contains `"batch jobs"`
+- `chart_data.dates` has monthly entries
+- `chart_data.values` has realistic NDVI values (0.1–0.8 range)
+---
+### Task 8: Clean up diagnostic endpoint
+**Files:**
+- Modify: `app/main.py` (remove `/api/debug/cdse-status`)
+- [ ] **Step 1: Remove diagnostic endpoint**
+Delete the entire `@app.get("/api/debug/cdse-status")` function from `app/main.py` (the block added during debugging).
+- [ ] **Step 2: Run tests**
+Run: `pytest tests/ -x -q`
+Expected: All tests pass
+- [ ] **Step 3: Commit**
+```bash
+git add app/main.py
+git commit -m "chore: remove temporary CDSE diagnostic endpoint"
+```
+- [ ] **Step 4: Push to both remotes**
+```bash
+git push origin main && git push hf main
+```
+---
+### Task 9: Restore BASELINE_YEARS to 5
+**Files:**
+- Modify: `app/indicators/ndvi.py` (change constant)
+- [ ] **Step 1: Update constant**
+In `app/indicators/ndvi.py`, change:
+```python
+BASELINE_YEARS = 1
+```
+to:
+```python
+BASELINE_YEARS = 5
+```
+- [ ] **Step 2: Run tests**
+Run: `pytest tests/ -x -q`
+Expected: All tests pass (tests don't depend on this value)
+- [ ] **Step 3: Commit and push**
+```bash
+git add app/indicators/ndvi.py
+git commit -m "feat: restore 5-year NDVI baseline now that batch jobs handle the load"
+git push origin main && git push hf main
+```