drone-landing-safety / ARCHITECTURE.md
yakvrz's picture
Switch rooftop masking to SAM3 and refresh demos
c5794e7
# Landing Site Safety Analyzer – Architecture and Calculations
This document describes the flow in the current Gradio app (`app/ui.py`), from input selection through model inference, safety scoring, and UI composition.
## Data and Models
- **Inputs**: Images under `data/Image/` (VISLOC and any custom folders) via `list_all_data_inputs`, with a 5% border crop (`crop_nonblack`) to drop black padding. Supported extensions: jpg/jpeg/png (any case).
- **Depth model**: Depth Anything 3, cached per model id (`DepthEngine`). Inference caps the long side to `process_res_cap` (default 1024) using `upper_bound_resize` before predicting.
- **Segmentation model**: SAM3 (`facebook/sam3`) for promptable water/road/tree/roof masking. The segmenter is cached per model id but masks are recomputed every run (no output cache). Default `segmentation_max_side` is 512 and is clamped to the depth resolution (min 128).
## Constants and Defaults
- Altitude/FOV defaults: 450 m, 90° (footprint default 10 m).
- Thresholds: `std_thresh` default 0.005, `grad_thresh` default 0.1; both auto-scale with depth resolution so sliders act as base values.
- Clearance factor: default 1.0 (dilates hazards by the footprint size).
- Coverage strictness: default 0.95 (fraction of the footprint that must be safe).
- Texture threshold: default 0.3 (suppresses highly textured regions).
- Depth smoothing is supported but set to 0.0 in the UI (effectively off).
- Roof mask: SAM3 promptable segmentation (default prompt: `roof`), resized to depth scale and expanded to footprint size; no depth-based roof heuristics remain.
## Per-Image Processing Pipeline
1. **Load and crop** the selected image (RGB, 5% border removed).
2. **Depth inference**: Run DA3 with long side clamped to `process_res_cap`; obtain `depth_raw`, then detrend with `remove_global_plane`. Optional Gaussian blur uses `depth_smoothing_base * res_scale` (currently zero).
3. **Footprint sizing**:
- `fx = (W/2) / tan(FOV/2)` where `W` is depth width.
- `patch_px = footprint_m * fx / altitude_m`, clamped to bounds and forced odd; `half_span = patch_px//2`.
- Visualization window `vis_patch` is an odd size capped to 1/8 of the smallest depth dimension for sharper std previews.
4. **Texture mask**: Sobel magnitude on the RGB (blurred by `patch_px/40`), normalized; pixels above `texture_threshold` are suppressed.
5. **Segmentation masks (optional)**:
- Water/Road/Tree/Roof via SAM3 at `segmentation_max_side`, with text prompts. Instance masks are unioned per class, resized to depth scale, and dilated to footprint size for blocking.
6. **Flat region search (`pick_flat_patch`)**:
- Normalize depth to [0,1], compute `std_map` via box mean/mean_sq, and `grad_norm` via `np.gradient` normalized at the 95th percentile.
- Landing mask starts from `grad_norm < grad_thresh_eff`, excludes water if present, and keeps the lowest-variance patch as a fallback box.
7. **Safe mask construction**:
- Base safe mask: `(std_map < std_thresh_eff) & (grad_norm < grad_thresh_eff) & landing_mask & texture_mask`.
- Apply segmentation blocks (expanded masks) to remove water/road/tree/roof regions.
- Clearance: dilate hazards by `clearance_factor * patch_px` (default 1.0).
- Coverage: box filter with `patch_px` window; keep pixels meeting `coverage_strictness` (default 0.95).
- Drop small components (< footprint area).
8. **Center selection**:
- Prefer centers where full-footprint coverage exists; choose the largest component and rank by distance transform minus flatness penalty (`openness_weight`).
- If no full coverage but safe pixels exist, pick the safest point inside the safe mask (distance vs. flatness).
- Fallbacks: landing mask with segmentation removed; if empty, use the flattest patch center.
- Convert depth center to image space; footprint box is scaled to image pixels and clamped to a minimum of 3 px.
9. **Visualization layers**:
- Depth colormap from `depth_raw`.
- Flatness std preview (`std_map_vis`), gradient magnitude, gradient mask, flatness heatmap overlay.
- Water/Road/Tree/Roof masks and per-class hazard overlays.
- Safety overlays: green safe heatmap, red hazard overlay from `risk_map`, grayscale safety score, landing spot box/crosshair.
## Safety Heatmap and Hazards
- `safe_mask` drives the green overlay (alpha per pixel).
- Hazard overlay uses `risk_map` (max of std/grad over-threshold). Pixels above `risk_threshold` are emphasized; water/road/tree hazards can also be overlaid separately.
## Overlay Composition (`compose_view`)
- Base view: one of the named layers (RGB/Depth/Flatness/Gradient/Gradient mask/Water mask/Road mask/Tree mask/Safety score/Safety heatmap overlay).
- Overlays: safety heatmap, hazard heatmap, per-class hazards (water/road/tree), gradient, optional landing spot box. Fixed alpha values; toggling overlays does not rerun inference.
- Returned image is RGB.
## Caching and State
- Depth model cache keyed by model id (`DepthEngine`); default model is preloaded.
- SAM3 models are cached per id; masks are not cached and are recomputed every run to reflect real-time cost.
- `images_state` holds the latest rendered layers; overlay-only changes don’t rerun inference. Prompt changes only re-trigger processing on submit/Run, not every keystroke.
## User Controls and Effects
- `process_res_cap`: depth max side (px) for DA3.
- `footprint_m`, `altitude_m`, `fov_deg`: determine footprint size in pixels.
- `std_thresh`, `grad_thresh`, `texture_threshold`: safety criteria.
- `clearance_factor`: hazard dilation (default 1.0).
- `coverage_strictness`, `openness_weight`: coverage tolerance and center ranking.
- Segmentation toggles, prompts, `segmentation_max_side`, `segmentation_score_thresh`, `segmentation_mask_thresh`: control SAM3 masks for water/road/tree.
- Overlay toggles affect only display; inference results are reused.
## Error Handling
- Bad/missing inputs raise Gradio errors.
- Segmentation failures warn and proceed without that mask; water/road/tree/roof warnings clarify when masks are disabled or not detected.
- Coverage/boxFilter fallbacks keep processing even if OpenCV operations fail.
## Outputs
- Dict of PIL Images keyed by: RGB, Depth, Flatness map (std), Depth gradient, Gradient mask, Water mask, Road mask, Tree mask, Roof mask, Safety heatmap overlay, Hazard overlay, Water/Road/Tree hazard overlays, Flatness heatmap overlay, Safety score (grayscale), Landing spot overlay.
- Run summaries surface model id, process resolution, runtime, footprint size (depth + image scale), landing center, safe/hazard coverage, effective thresholds, per-mask coverage (water/road/tree/roof), and warnings for disabled/absent masks or missing safe regions; the UI cards render these fields directly.
- `compose_view` uses these to build the preview.