Spaces:

yakvrz
/

drone-landing-safety

Runtime error

App Files Files Community

drone-landing-safety / ARCHITECTURE.md

yakvrz

Switch rooftop masking to SAM3 and refresh demos

c5794e7 18 days ago

preview code

raw

history blame contribute delete

6.85 kB

A newer version of the Gradio SDK is available: 6.1.0

Upgrade

Landing Site Safety Analyzer – Architecture and Calculations

This document describes the flow in the current Gradio app (app/ui.py), from input selection through model inference, safety scoring, and UI composition.

Data and Models

Inputs: Images under data/Image/ (VISLOC and any custom folders) via list_all_data_inputs, with a 5% border crop (crop_nonblack) to drop black padding. Supported extensions: jpg/jpeg/png (any case).
Depth model: Depth Anything 3, cached per model id (DepthEngine). Inference caps the long side to process_res_cap (default 1024) using upper_bound_resize before predicting.
Segmentation model: SAM3 (facebook/sam3) for promptable water/road/tree/roof masking. The segmenter is cached per model id but masks are recomputed every run (no output cache). Default segmentation_max_side is 512 and is clamped to the depth resolution (min 128).

Constants and Defaults

Altitude/FOV defaults: 450 m, 90° (footprint default 10 m).
Thresholds: std_thresh default 0.005, grad_thresh default 0.1; both auto-scale with depth resolution so sliders act as base values.
Clearance factor: default 1.0 (dilates hazards by the footprint size).
Coverage strictness: default 0.95 (fraction of the footprint that must be safe).
Texture threshold: default 0.3 (suppresses highly textured regions).
Depth smoothing is supported but set to 0.0 in the UI (effectively off).
Roof mask: SAM3 promptable segmentation (default prompt: roof), resized to depth scale and expanded to footprint size; no depth-based roof heuristics remain.

Per-Image Processing Pipeline

Load and crop the selected image (RGB, 5% border removed).
Depth inference: Run DA3 with long side clamped to process_res_cap; obtain depth_raw, then detrend with remove_global_plane. Optional Gaussian blur uses depth_smoothing_base * res_scale (currently zero).
Footprint sizing:
- fx = (W/2) / tan(FOV/2) where W is depth width.
- patch_px = footprint_m * fx / altitude_m, clamped to bounds and forced odd; half_span = patch_px//2.
- Visualization window vis_patch is an odd size capped to 1/8 of the smallest depth dimension for sharper std previews.
Texture mask: Sobel magnitude on the RGB (blurred by patch_px/40), normalized; pixels above texture_threshold are suppressed.
Segmentation masks (optional):
- Water/Road/Tree/Roof via SAM3 at segmentation_max_side, with text prompts. Instance masks are unioned per class, resized to depth scale, and dilated to footprint size for blocking.
Flat region search (pick_flat_patch):
- Normalize depth to [0,1], compute std_map via box mean/mean_sq, and grad_norm via np.gradient normalized at the 95th percentile.
- Landing mask starts from grad_norm < grad_thresh_eff, excludes water if present, and keeps the lowest-variance patch as a fallback box.
Safe mask construction:
- Base safe mask: (std_map < std_thresh_eff) & (grad_norm < grad_thresh_eff) & landing_mask & texture_mask.
- Apply segmentation blocks (expanded masks) to remove water/road/tree/roof regions.
- Clearance: dilate hazards by clearance_factor * patch_px (default 1.0).
- Coverage: box filter with patch_px window; keep pixels meeting coverage_strictness (default 0.95).
- Drop small components (< footprint area).
Center selection:
- Prefer centers where full-footprint coverage exists; choose the largest component and rank by distance transform minus flatness penalty (openness_weight).
- If no full coverage but safe pixels exist, pick the safest point inside the safe mask (distance vs. flatness).
- Fallbacks: landing mask with segmentation removed; if empty, use the flattest patch center.
- Convert depth center to image space; footprint box is scaled to image pixels and clamped to a minimum of 3 px.
Visualization layers:
- Depth colormap from depth_raw.
- Flatness std preview (std_map_vis), gradient magnitude, gradient mask, flatness heatmap overlay.
- Water/Road/Tree/Roof masks and per-class hazard overlays.
- Safety overlays: green safe heatmap, red hazard overlay from risk_map, grayscale safety score, landing spot box/crosshair.

Safety Heatmap and Hazards

safe_mask drives the green overlay (alpha per pixel).
Hazard overlay uses risk_map (max of std/grad over-threshold). Pixels above risk_threshold are emphasized; water/road/tree hazards can also be overlaid separately.

Overlay Composition (`compose_view`)

Base view: one of the named layers (RGB/Depth/Flatness/Gradient/Gradient mask/Water mask/Road mask/Tree mask/Safety score/Safety heatmap overlay).
Overlays: safety heatmap, hazard heatmap, per-class hazards (water/road/tree), gradient, optional landing spot box. Fixed alpha values; toggling overlays does not rerun inference.
Returned image is RGB.

Caching and State

Depth model cache keyed by model id (DepthEngine); default model is preloaded.
SAM3 models are cached per id; masks are not cached and are recomputed every run to reflect real-time cost.
images_state holds the latest rendered layers; overlay-only changes don’t rerun inference. Prompt changes only re-trigger processing on submit/Run, not every keystroke.

User Controls and Effects

process_res_cap: depth max side (px) for DA3.
footprint_m, altitude_m, fov_deg: determine footprint size in pixels.
std_thresh, grad_thresh, texture_threshold: safety criteria.
clearance_factor: hazard dilation (default 1.0).
coverage_strictness, openness_weight: coverage tolerance and center ranking.
Segmentation toggles, prompts, segmentation_max_side, segmentation_score_thresh, segmentation_mask_thresh: control SAM3 masks for water/road/tree.
Overlay toggles affect only display; inference results are reused.

Error Handling

Bad/missing inputs raise Gradio errors.
Segmentation failures warn and proceed without that mask; water/road/tree/roof warnings clarify when masks are disabled or not detected.
Coverage/boxFilter fallbacks keep processing even if OpenCV operations fail.

Outputs

Dict of PIL Images keyed by: RGB, Depth, Flatness map (std), Depth gradient, Gradient mask, Water mask, Road mask, Tree mask, Roof mask, Safety heatmap overlay, Hazard overlay, Water/Road/Tree hazard overlays, Flatness heatmap overlay, Safety score (grayscale), Landing spot overlay.
Run summaries surface model id, process resolution, runtime, footprint size (depth + image scale), landing center, safe/hazard coverage, effective thresholds, per-mask coverage (water/road/tree/roof), and warnings for disabled/absent masks or missing safe regions; the UI cards render these fields directly.
compose_view uses these to build the preview.