KSvend Claude Happy commited on
Commit
440823e
·
1 Parent(s): cb6d0e1

docs: spec for SAR log10 fix and CDSE sequential processing

Browse files

Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>

docs/superpowers/specs/2026-04-02-sar-log10-cdse-sequential-design.md ADDED
@@ -0,0 +1,96 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # SAR log10 Fix + CDSE Sequential Processing
2
+
3
+ **Date:** 2026-04-02
4
+ **Status:** Approved
5
+
6
+ ## Problem
7
+
8
+ Three failures observed when running all four batch indicators (NDVI, SAR, water, buildup) on HF Spaces:
9
+
10
+ 1. **SAR log10 crash** — `'ProcessBuilder' object has no attribute 'log10'`. The SAR graph builder calls `cube.apply(lambda x: 10.0 * x.log10())`, but `x` inside an `apply()` lambda is a `ProcessBuilder`, not a `DataCube`. `ProcessBuilder` does not have a `log10()` method.
11
+
12
+ 2. **CDSE concurrency overload** — Submitting 9+ batch jobs simultaneously (3 per indicator) overwhelms CDSE. In production logs, buildup jobs never left "queued" in 20 minutes. Water had one job stuck in "queued" the entire time. After the 20-minute batch timeout, sync fallbacks also failed with CDSE read timeouts and 500 errors.
13
+
14
+ 3. **Cascade failure** — Timed-out batch jobs still consume CDSE resources, causing sync fallbacks to also timeout (read timeout=1800s) or return 500 Internal Server Error.
15
+
16
+ ## Fix 1: SAR log10
17
+
18
+ **File:** `app/openeo_client.py`, line 245
19
+
20
+ **Before:**
21
+ ```python
22
+ cube = cube.apply(lambda x: 10.0 * x.log10())
23
+ ```
24
+
25
+ **After:**
26
+ ```python
27
+ cube = 10.0 * cube.log10()
28
+ ```
29
+
30
+ `DataCube.log10()` is a documented method in the openEO Python client API. It applies the base-10 logarithm to each element and returns a new `DataCube`. Arithmetic operations (`10.0 * cube`) are supported at the DataCube level.
31
+
32
+ ## Fix 2: Sequential Indicator Processing
33
+
34
+ **File:** `app/worker.py`
35
+
36
+ ### Current architecture (3-phase)
37
+
38
+ 1. **Phase 1:** Submit all batch indicators' jobs to CDSE
39
+ 2. **Phase 2:** Poll all pending jobs in a shared loop until all finish or timeout
40
+ 3. **Phase 3:** Harvest results for all indicators
41
+
42
+ This submits up to 12 concurrent CDSE jobs (4 indicators x 3 jobs each), overwhelming CDSE's per-user concurrency limits.
43
+
44
+ ### New architecture (sequential per-indicator)
45
+
46
+ For each batch indicator, in order:
47
+
48
+ 1. Submit its batch jobs (up to 3)
49
+ 2. Poll until those jobs finish, error, or timeout
50
+ 3. Harvest results (or fall back to sync `process()`)
51
+ 4. Save results and update progress
52
+ 5. Move to next indicator
53
+
54
+ Non-batch indicators (fires, rainfall, nightlights) run after all batch indicators complete, same as today.
55
+
56
+ ### Timeout change
57
+
58
+ - `BATCH_TIMEOUT`: 1200s (20 min) -> 2400s (40 min)
59
+ - This timeout is now **per indicator**, not global
60
+ - Rationale: NDVI alone took ~19 min when run solo with no contention. With sequential processing, each indicator gets full CDSE bandwidth, so most should finish well within 40 min.
61
+
62
+ ### Concurrency impact
63
+
64
+ - **Before:** Up to 12 concurrent CDSE jobs
65
+ - **After:** Max 3 concurrent CDSE jobs at any time
66
+ - Total wall-clock time increases (sequential vs parallel), but reliability is dramatically improved since CDSE won't throttle/queue jobs
67
+
68
+ ## Testing
69
+
70
+ ### Automated tests
71
+
72
+ - **`test_openeo_client.py`**: Existing SAR graph builder test exercises the fixed `cube.log10()` path. Mock DataCube supports this method.
73
+ - **`test_worker.py`**: Existing batch flow test uses mocks that return immediately. The sequential structure is transparent to mocks. Adjust assertions if test checks phase ordering.
74
+
75
+ ### Manual verification (post-deploy)
76
+
77
+ Run each indicator solo on HF Spaces to confirm real CDSE execution:
78
+
79
+ 1. NDVI only
80
+ 2. SAR only (was completely broken before — `ProcessBuilder` crash)
81
+ 3. Water only
82
+ 4. Buildup only
83
+ 5. Combined run with all 4 batch indicators
84
+
85
+ ## Files Changed
86
+
87
+ | File | Change |
88
+ |------|--------|
89
+ | `app/openeo_client.py` | Fix `log10` call on line 245 |
90
+ | `app/worker.py` | Replace 3-phase batch architecture with sequential per-indicator loop; bump `BATCH_TIMEOUT` to 2400 |
91
+
92
+ ## Out of Scope
93
+
94
+ - Changing the order of log10 vs median aggregation (SAR methodology)
95
+ - Automated integration tests against live CDSE
96
+ - `download_results` deprecation warnings (cosmetic, not blocking)