drone-landing-safety / ARCHITECTURE.md
yakvrz's picture
Switch rooftop masking to SAM3 and refresh demos
c5794e7

A newer version of the Gradio SDK is available: 6.1.0

Upgrade

Landing Site Safety Analyzer – Architecture and Calculations

This document describes the flow in the current Gradio app (app/ui.py), from input selection through model inference, safety scoring, and UI composition.

Data and Models

  • Inputs: Images under data/Image/ (VISLOC and any custom folders) via list_all_data_inputs, with a 5% border crop (crop_nonblack) to drop black padding. Supported extensions: jpg/jpeg/png (any case).
  • Depth model: Depth Anything 3, cached per model id (DepthEngine). Inference caps the long side to process_res_cap (default 1024) using upper_bound_resize before predicting.
  • Segmentation model: SAM3 (facebook/sam3) for promptable water/road/tree/roof masking. The segmenter is cached per model id but masks are recomputed every run (no output cache). Default segmentation_max_side is 512 and is clamped to the depth resolution (min 128).

Constants and Defaults

  • Altitude/FOV defaults: 450 m, 90° (footprint default 10 m).
  • Thresholds: std_thresh default 0.005, grad_thresh default 0.1; both auto-scale with depth resolution so sliders act as base values.
  • Clearance factor: default 1.0 (dilates hazards by the footprint size).
  • Coverage strictness: default 0.95 (fraction of the footprint that must be safe).
  • Texture threshold: default 0.3 (suppresses highly textured regions).
  • Depth smoothing is supported but set to 0.0 in the UI (effectively off).
  • Roof mask: SAM3 promptable segmentation (default prompt: roof), resized to depth scale and expanded to footprint size; no depth-based roof heuristics remain.

Per-Image Processing Pipeline

  1. Load and crop the selected image (RGB, 5% border removed).
  2. Depth inference: Run DA3 with long side clamped to process_res_cap; obtain depth_raw, then detrend with remove_global_plane. Optional Gaussian blur uses depth_smoothing_base * res_scale (currently zero).
  3. Footprint sizing:
    • fx = (W/2) / tan(FOV/2) where W is depth width.
    • patch_px = footprint_m * fx / altitude_m, clamped to bounds and forced odd; half_span = patch_px//2.
    • Visualization window vis_patch is an odd size capped to 1/8 of the smallest depth dimension for sharper std previews.
  4. Texture mask: Sobel magnitude on the RGB (blurred by patch_px/40), normalized; pixels above texture_threshold are suppressed.
  5. Segmentation masks (optional):
    • Water/Road/Tree/Roof via SAM3 at segmentation_max_side, with text prompts. Instance masks are unioned per class, resized to depth scale, and dilated to footprint size for blocking.
  6. Flat region search (pick_flat_patch):
    • Normalize depth to [0,1], compute std_map via box mean/mean_sq, and grad_norm via np.gradient normalized at the 95th percentile.
    • Landing mask starts from grad_norm < grad_thresh_eff, excludes water if present, and keeps the lowest-variance patch as a fallback box.
  7. Safe mask construction:
    • Base safe mask: (std_map < std_thresh_eff) & (grad_norm < grad_thresh_eff) & landing_mask & texture_mask.
    • Apply segmentation blocks (expanded masks) to remove water/road/tree/roof regions.
    • Clearance: dilate hazards by clearance_factor * patch_px (default 1.0).
    • Coverage: box filter with patch_px window; keep pixels meeting coverage_strictness (default 0.95).
    • Drop small components (< footprint area).
  8. Center selection:
    • Prefer centers where full-footprint coverage exists; choose the largest component and rank by distance transform minus flatness penalty (openness_weight).
    • If no full coverage but safe pixels exist, pick the safest point inside the safe mask (distance vs. flatness).
    • Fallbacks: landing mask with segmentation removed; if empty, use the flattest patch center.
    • Convert depth center to image space; footprint box is scaled to image pixels and clamped to a minimum of 3 px.
  9. Visualization layers:
    • Depth colormap from depth_raw.
    • Flatness std preview (std_map_vis), gradient magnitude, gradient mask, flatness heatmap overlay.
    • Water/Road/Tree/Roof masks and per-class hazard overlays.
    • Safety overlays: green safe heatmap, red hazard overlay from risk_map, grayscale safety score, landing spot box/crosshair.

Safety Heatmap and Hazards

  • safe_mask drives the green overlay (alpha per pixel).
  • Hazard overlay uses risk_map (max of std/grad over-threshold). Pixels above risk_threshold are emphasized; water/road/tree hazards can also be overlaid separately.

Overlay Composition (compose_view)

  • Base view: one of the named layers (RGB/Depth/Flatness/Gradient/Gradient mask/Water mask/Road mask/Tree mask/Safety score/Safety heatmap overlay).
  • Overlays: safety heatmap, hazard heatmap, per-class hazards (water/road/tree), gradient, optional landing spot box. Fixed alpha values; toggling overlays does not rerun inference.
  • Returned image is RGB.

Caching and State

  • Depth model cache keyed by model id (DepthEngine); default model is preloaded.
  • SAM3 models are cached per id; masks are not cached and are recomputed every run to reflect real-time cost.
  • images_state holds the latest rendered layers; overlay-only changes don’t rerun inference. Prompt changes only re-trigger processing on submit/Run, not every keystroke.

User Controls and Effects

  • process_res_cap: depth max side (px) for DA3.
  • footprint_m, altitude_m, fov_deg: determine footprint size in pixels.
  • std_thresh, grad_thresh, texture_threshold: safety criteria.
  • clearance_factor: hazard dilation (default 1.0).
  • coverage_strictness, openness_weight: coverage tolerance and center ranking.
  • Segmentation toggles, prompts, segmentation_max_side, segmentation_score_thresh, segmentation_mask_thresh: control SAM3 masks for water/road/tree.
  • Overlay toggles affect only display; inference results are reused.

Error Handling

  • Bad/missing inputs raise Gradio errors.
  • Segmentation failures warn and proceed without that mask; water/road/tree/roof warnings clarify when masks are disabled or not detected.
  • Coverage/boxFilter fallbacks keep processing even if OpenCV operations fail.

Outputs

  • Dict of PIL Images keyed by: RGB, Depth, Flatness map (std), Depth gradient, Gradient mask, Water mask, Road mask, Tree mask, Roof mask, Safety heatmap overlay, Hazard overlay, Water/Road/Tree hazard overlays, Flatness heatmap overlay, Safety score (grayscale), Landing spot overlay.
  • Run summaries surface model id, process resolution, runtime, footprint size (depth + image scale), landing center, safe/hazard coverage, effective thresholds, per-mask coverage (water/road/tree/roof), and warnings for disabled/absent masks or missing safe regions; the UI cards render these fields directly.
  • compose_view uses these to build the preview.