Spaces:

yakvrz
/

drone-landing-safety

Runtime error

App Files Files Community

drone-landing-safety / ARCHITECTURE.md

yakvrz

Switch rooftop masking to SAM3 and refresh demos

c5794e7 18 days ago

preview code

raw

history blame contribute delete

6.85 kB

	# Landing Site Safety Analyzer – Architecture and Calculations

	This document describes the flow in the current Gradio app (`app/ui.py`), from input selection through model inference, safety scoring, and UI composition.

	## Data and Models
	- Inputs: Images under `data/Image/` (VISLOC and any custom folders) via `list_all_data_inputs`, with a 5% border crop (`crop_nonblack`) to drop black padding. Supported extensions: jpg/jpeg/png (any case).
	- Depth model: Depth Anything 3, cached per model id (`DepthEngine`). Inference caps the long side to `process_res_cap` (default 1024) using `upper_bound_resize` before predicting.
	- Segmentation model: SAM3 (`facebook/sam3`) for promptable water/road/tree/roof masking. The segmenter is cached per model id but masks are recomputed every run (no output cache). Default `segmentation_max_side` is 512 and is clamped to the depth resolution (min 128).

	## Constants and Defaults
	- Altitude/FOV defaults: 450 m, 90° (footprint default 10 m).
	- Thresholds: `std_thresh` default 0.005, `grad_thresh` default 0.1; both auto-scale with depth resolution so sliders act as base values.
	- Clearance factor: default 1.0 (dilates hazards by the footprint size).
	- Coverage strictness: default 0.95 (fraction of the footprint that must be safe).
	- Texture threshold: default 0.3 (suppresses highly textured regions).
	- Depth smoothing is supported but set to 0.0 in the UI (effectively off).
	- Roof mask: SAM3 promptable segmentation (default prompt: `roof`), resized to depth scale and expanded to footprint size; no depth-based roof heuristics remain.

	## Per-Image Processing Pipeline
	1. Load and crop the selected image (RGB, 5% border removed).
	2. Depth inference: Run DA3 with long side clamped to `process_res_cap`; obtain `depth_raw`, then detrend with `remove_global_plane`. Optional Gaussian blur uses `depth_smoothing_base * res_scale` (currently zero).
	3. Footprint sizing:
	- `fx = (W/2) / tan(FOV/2)` where `W` is depth width.
	- `patch_px = footprint_m * fx / altitude_m`, clamped to bounds and forced odd; `half_span = patch_px//2`.
	- Visualization window `vis_patch` is an odd size capped to 1/8 of the smallest depth dimension for sharper std previews.
	4. Texture mask: Sobel magnitude on the RGB (blurred by `patch_px/40`), normalized; pixels above `texture_threshold` are suppressed.
	5. Segmentation masks (optional):
	- Water/Road/Tree/Roof via SAM3 at `segmentation_max_side`, with text prompts. Instance masks are unioned per class, resized to depth scale, and dilated to footprint size for blocking.
	6. Flat region search (`pick_flat_patch`):
	- Normalize depth to [0,1], compute `std_map` via box mean/mean_sq, and `grad_norm` via `np.gradient` normalized at the 95th percentile.
	- Landing mask starts from `grad_norm < grad_thresh_eff`, excludes water if present, and keeps the lowest-variance patch as a fallback box.
	7. Safe mask construction:
	- Base safe mask: `(std_map < std_thresh_eff) & (grad_norm < grad_thresh_eff) & landing_mask & texture_mask`.
	- Apply segmentation blocks (expanded masks) to remove water/road/tree/roof regions.
	- Clearance: dilate hazards by `clearance_factor * patch_px` (default 1.0).
	- Coverage: box filter with `patch_px` window; keep pixels meeting `coverage_strictness` (default 0.95).
	- Drop small components (< footprint area).
	8. Center selection:
	- Prefer centers where full-footprint coverage exists; choose the largest component and rank by distance transform minus flatness penalty (`openness_weight`).
	- If no full coverage but safe pixels exist, pick the safest point inside the safe mask (distance vs. flatness).
	- Fallbacks: landing mask with segmentation removed; if empty, use the flattest patch center.
	- Convert depth center to image space; footprint box is scaled to image pixels and clamped to a minimum of 3 px.
	9. Visualization layers:
	- Depth colormap from `depth_raw`.
	- Flatness std preview (`std_map_vis`), gradient magnitude, gradient mask, flatness heatmap overlay.
	- Water/Road/Tree/Roof masks and per-class hazard overlays.
	- Safety overlays: green safe heatmap, red hazard overlay from `risk_map`, grayscale safety score, landing spot box/crosshair.

	## Safety Heatmap and Hazards
	- `safe_mask` drives the green overlay (alpha per pixel).
	- Hazard overlay uses `risk_map` (max of std/grad over-threshold). Pixels above `risk_threshold` are emphasized; water/road/tree hazards can also be overlaid separately.

	## Overlay Composition (`compose_view`)
	- Base view: one of the named layers (RGB/Depth/Flatness/Gradient/Gradient mask/Water mask/Road mask/Tree mask/Safety score/Safety heatmap overlay).
	- Overlays: safety heatmap, hazard heatmap, per-class hazards (water/road/tree), gradient, optional landing spot box. Fixed alpha values; toggling overlays does not rerun inference.
	- Returned image is RGB.

	## Caching and State
	- Depth model cache keyed by model id (`DepthEngine`); default model is preloaded.
	- SAM3 models are cached per id; masks are not cached and are recomputed every run to reflect real-time cost.
	- `images_state` holds the latest rendered layers; overlay-only changes don’t rerun inference. Prompt changes only re-trigger processing on submit/Run, not every keystroke.

	## User Controls and Effects
	- `process_res_cap`: depth max side (px) for DA3.
	- `footprint_m`, `altitude_m`, `fov_deg`: determine footprint size in pixels.
	- `std_thresh`, `grad_thresh`, `texture_threshold`: safety criteria.
	- `clearance_factor`: hazard dilation (default 1.0).
	- `coverage_strictness`, `openness_weight`: coverage tolerance and center ranking.
	- Segmentation toggles, prompts, `segmentation_max_side`, `segmentation_score_thresh`, `segmentation_mask_thresh`: control SAM3 masks for water/road/tree.
	- Overlay toggles affect only display; inference results are reused.

	## Error Handling
	- Bad/missing inputs raise Gradio errors.
	- Segmentation failures warn and proceed without that mask; water/road/tree/roof warnings clarify when masks are disabled or not detected.
	- Coverage/boxFilter fallbacks keep processing even if OpenCV operations fail.

	## Outputs
	- Dict of PIL Images keyed by: RGB, Depth, Flatness map (std), Depth gradient, Gradient mask, Water mask, Road mask, Tree mask, Roof mask, Safety heatmap overlay, Hazard overlay, Water/Road/Tree hazard overlays, Flatness heatmap overlay, Safety score (grayscale), Landing spot overlay.
	- Run summaries surface model id, process resolution, runtime, footprint size (depth + image scale), landing center, safe/hazard coverage, effective thresholds, per-mask coverage (water/road/tree/roof), and warnings for disabled/absent masks or missing safe regions; the UI cards render these fields directly.
	- `compose_view` uses these to build the preview.