KSvend Claude Happy commited on
Commit ·
440823e
1
Parent(s): cb6d0e1
docs: spec for SAR log10 fix and CDSE sequential processing
Browse filesGenerated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)
Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>
docs/superpowers/specs/2026-04-02-sar-log10-cdse-sequential-design.md
ADDED
|
@@ -0,0 +1,96 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# SAR log10 Fix + CDSE Sequential Processing
|
| 2 |
+
|
| 3 |
+
**Date:** 2026-04-02
|
| 4 |
+
**Status:** Approved
|
| 5 |
+
|
| 6 |
+
## Problem
|
| 7 |
+
|
| 8 |
+
Three failures observed when running all four batch indicators (NDVI, SAR, water, buildup) on HF Spaces:
|
| 9 |
+
|
| 10 |
+
1. **SAR log10 crash** — `'ProcessBuilder' object has no attribute 'log10'`. The SAR graph builder calls `cube.apply(lambda x: 10.0 * x.log10())`, but `x` inside an `apply()` lambda is a `ProcessBuilder`, not a `DataCube`. `ProcessBuilder` does not have a `log10()` method.
|
| 11 |
+
|
| 12 |
+
2. **CDSE concurrency overload** — Submitting 9+ batch jobs simultaneously (3 per indicator) overwhelms CDSE. In production logs, buildup jobs never left "queued" in 20 minutes. Water had one job stuck in "queued" the entire time. After the 20-minute batch timeout, sync fallbacks also failed with CDSE read timeouts and 500 errors.
|
| 13 |
+
|
| 14 |
+
3. **Cascade failure** — Timed-out batch jobs still consume CDSE resources, causing sync fallbacks to also timeout (read timeout=1800s) or return 500 Internal Server Error.
|
| 15 |
+
|
| 16 |
+
## Fix 1: SAR log10
|
| 17 |
+
|
| 18 |
+
**File:** `app/openeo_client.py`, line 245
|
| 19 |
+
|
| 20 |
+
**Before:**
|
| 21 |
+
```python
|
| 22 |
+
cube = cube.apply(lambda x: 10.0 * x.log10())
|
| 23 |
+
```
|
| 24 |
+
|
| 25 |
+
**After:**
|
| 26 |
+
```python
|
| 27 |
+
cube = 10.0 * cube.log10()
|
| 28 |
+
```
|
| 29 |
+
|
| 30 |
+
`DataCube.log10()` is a documented method in the openEO Python client API. It applies the base-10 logarithm to each element and returns a new `DataCube`. Arithmetic operations (`10.0 * cube`) are supported at the DataCube level.
|
| 31 |
+
|
| 32 |
+
## Fix 2: Sequential Indicator Processing
|
| 33 |
+
|
| 34 |
+
**File:** `app/worker.py`
|
| 35 |
+
|
| 36 |
+
### Current architecture (3-phase)
|
| 37 |
+
|
| 38 |
+
1. **Phase 1:** Submit all batch indicators' jobs to CDSE
|
| 39 |
+
2. **Phase 2:** Poll all pending jobs in a shared loop until all finish or timeout
|
| 40 |
+
3. **Phase 3:** Harvest results for all indicators
|
| 41 |
+
|
| 42 |
+
This submits up to 12 concurrent CDSE jobs (4 indicators x 3 jobs each), overwhelming CDSE's per-user concurrency limits.
|
| 43 |
+
|
| 44 |
+
### New architecture (sequential per-indicator)
|
| 45 |
+
|
| 46 |
+
For each batch indicator, in order:
|
| 47 |
+
|
| 48 |
+
1. Submit its batch jobs (up to 3)
|
| 49 |
+
2. Poll until those jobs finish, error, or timeout
|
| 50 |
+
3. Harvest results (or fall back to sync `process()`)
|
| 51 |
+
4. Save results and update progress
|
| 52 |
+
5. Move to next indicator
|
| 53 |
+
|
| 54 |
+
Non-batch indicators (fires, rainfall, nightlights) run after all batch indicators complete, same as today.
|
| 55 |
+
|
| 56 |
+
### Timeout change
|
| 57 |
+
|
| 58 |
+
- `BATCH_TIMEOUT`: 1200s (20 min) -> 2400s (40 min)
|
| 59 |
+
- This timeout is now **per indicator**, not global
|
| 60 |
+
- Rationale: NDVI alone took ~19 min when run solo with no contention. With sequential processing, each indicator gets full CDSE bandwidth, so most should finish well within 40 min.
|
| 61 |
+
|
| 62 |
+
### Concurrency impact
|
| 63 |
+
|
| 64 |
+
- **Before:** Up to 12 concurrent CDSE jobs
|
| 65 |
+
- **After:** Max 3 concurrent CDSE jobs at any time
|
| 66 |
+
- Total wall-clock time increases (sequential vs parallel), but reliability is dramatically improved since CDSE won't throttle/queue jobs
|
| 67 |
+
|
| 68 |
+
## Testing
|
| 69 |
+
|
| 70 |
+
### Automated tests
|
| 71 |
+
|
| 72 |
+
- **`test_openeo_client.py`**: Existing SAR graph builder test exercises the fixed `cube.log10()` path. Mock DataCube supports this method.
|
| 73 |
+
- **`test_worker.py`**: Existing batch flow test uses mocks that return immediately. The sequential structure is transparent to mocks. Adjust assertions if test checks phase ordering.
|
| 74 |
+
|
| 75 |
+
### Manual verification (post-deploy)
|
| 76 |
+
|
| 77 |
+
Run each indicator solo on HF Spaces to confirm real CDSE execution:
|
| 78 |
+
|
| 79 |
+
1. NDVI only
|
| 80 |
+
2. SAR only (was completely broken before — `ProcessBuilder` crash)
|
| 81 |
+
3. Water only
|
| 82 |
+
4. Buildup only
|
| 83 |
+
5. Combined run with all 4 batch indicators
|
| 84 |
+
|
| 85 |
+
## Files Changed
|
| 86 |
+
|
| 87 |
+
| File | Change |
|
| 88 |
+
|------|--------|
|
| 89 |
+
| `app/openeo_client.py` | Fix `log10` call on line 245 |
|
| 90 |
+
| `app/worker.py` | Replace 3-phase batch architecture with sequential per-indicator loop; bump `BATCH_TIMEOUT` to 2400 |
|
| 91 |
+
|
| 92 |
+
## Out of Scope
|
| 93 |
+
|
| 94 |
+
- Changing the order of log10 vs median aggregation (SAR methodology)
|
| 95 |
+
- Automated integration tests against live CDSE
|
| 96 |
+
- `download_results` deprecation warnings (cosmetic, not blocking)
|