Spaces:
Configuration error
Configuration error
| ```markdown | |
| # Design Doc: Motion- & Coverage-Aware Key Frame Selection | |
| **Author:** Brian Clark | |
| **Last Updated:** 2025-11-07 | |
| **Target Components:** `_compute_selected_frames`, Stream3R inference outputs | |
| **Goal:** Replace naive FPS sampling with a strategy that keeps only frames providing new camera poses and meaningful scene coverage, reducing point-cloud clutter and improving 2D scene graphs. | |
| --- | |
| ## 1. Overview | |
| We combine two complementary signals: | |
| 1. **Motion-aware downsampling (Option A):** ensure key frames are spaced by actual camera movement (SE(3) distance), not just time. | |
| 2. **Coverage-driven selection (Option B):** prefer frames that contribute new high-confidence geometry after Stream3R processing. | |
| The final key frame list is built by enforcing motion diversity first, then greedily adding frames with the largest uncovered coverage gain until we reach a target budget. | |
| --- | |
| ## 2. Inputs & Prerequisites | |
| - Per-frame camera extrinsics (`extrinsic`) from Stream3R. | |
| - Optional per-frame quality metrics (blur/confidence) from camera head. | |
| - Stream3R `world_points` and `world_points_conf` (or post-voxel-reduction point maps) to evaluate coverage. | |
| - Library support: NumPy + SciPy (for SE(3) distances), optional Open3D or custom KD-tree for point coverage. | |
| --- | |
| ## 3. Motion Metrics (Option A) | |
| ### 3.1 Pose difference | |
| - Compute translation delta: `||t_i - t_j||`. | |
| - Compute rotation delta: angle of `R_i * R_j^{-1}` via `acos((trace - 1) / 2)`. | |
| - Combine with weights (e.g., `motion = w_t * Δpos + w_r * Δrot`), with defaults `w_t=1.0`, `w_r=0.5 m/rad`. | |
| ### 3.2 Greedy spacing (temporal pass) | |
| 1. Initialize with first frame as key. | |
| 2. For each subsequent frame: | |
| - Accumulate motion distance from last key (sum of per-frame deltas). | |
| - If distance ≥ `motion_threshold` OR time since last key ≥ `max_gap`, mark as key. | |
| - Optional: enforce minimum gap (`min_gap_time`) to avoid bursty picks. | |
| 3. Result: `motion_keys` – baseline set with adequate pose coverage. | |
| ### 3.3 Quality gating (optional) | |
| - Discard frames with low focus / brightness (if metadata available). | |
| - Use confidence summary (mean `world_points_conf`) to veto worst frames before motion selection. | |
| --- | |
| ## 4. Coverage Metrics (Option B) | |
| ### 4.1 Coverage data | |
| - For each frame, gather the subset of point cloud indices it contributes above a confidence threshold. | |
| - Option 1: Use raw `world_points_conf` mask per frame. | |
| - Option 2: After voxel reduction, store voxel IDs touched by each frame (during inference loop). | |
| ### 4.2 Greedy coverage selection | |
| 1. Start with `coverage_keys = []`, `covered = set()`. | |
| 2. For each candidate frame (ordered by motion selection or confidence): | |
| - Compute `gain = new_points / total_points`, where `new_points = {points not in covered}`. | |
| - Keep a priority queue sorted by gain (breaking ties via motion distance or confidence). | |
| 3. While `coverage_keys` size < desired target (`top_k` or auto budget): | |
| - Pop frame with highest gain. | |
| - Add to `coverage_keys` and update `covered`. | |
| - Recompute gains lazily or maintain stored values (since coverage shrinks). | |
| 4. Merge with `motion_keys`: `selected = sorted(motion_keys ∪ coverage_keys)` preserving chronological order. | |
| ### 4.3 Parameters | |
| | Parameter | Purpose | Default | | |
| |-----------|---------|---------| | |
| | `coverage_conf_thres` | Minimum confidence per point | 0.3 | | |
| | `top_k` | Max key frames (if >0) | Provided payload | | |
| | `auto_budget_seconds` | If `top_k` not set, target frames per scene duration | 0.4 fps (≈12 frames for 30 s) | | |
| | `min_gain_ratio` | Stop if marginal gain < threshold | 0.01 | | |
| --- | |
| ## 5. Algorithm Outline | |
| ```text | |
| 1. Precompute per-frame metadata: | |
| - Motion deltas & cumulative distance | |
| - Frame quality/confidence | |
| - Coverage contributions (voxel IDs or hashed points) | |
| 2. Motion pass: | |
| motion_keys = greedy_motion_selection(frames, motion_threshold, min_gap, max_gap) | |
| 3. Coverage pass: | |
| candidates = frames filtered by quality & (if large scenes) downsampled using motion_keys as seeds | |
| coverage_keys = greedy_coverage_selection(candidates, contributions, budget) | |
| 4. Combine & finalize: | |
| selected = sort(unique(motion_keys ∪ coverage_keys)) | |
| if len(selected) > budget: prune lowest coverage gain while keeping motion anchors | |
| collect metadata (confidence, motion distance, coverage gain) for diagnostics | |
| 5. Optional reinflation pass (if enabled) to restore splat density for the selected frames only. | |
| 6. Emit diagnostics in `selected_frames.json`. | |
| ``` | |
| --- | |
| ## 6. Integration Points | |
| ### 6.1 `_compute_selected_frames` | |
| - Extend signature to accept: | |
| - `frame_records` (already present) | |
| - `extrinsics`, `world_points`, `world_points_conf` | |
| - optional `confidence_summary`, `frame_timestamps` | |
| - Return list of dicts with fields: `frame_id`, `motion_score`, `coverage_gain`, `cum_motion`, etc., so the artifact can explain the reasoning. | |
| ### 6.2 Inference loop | |
| - While iterating frames, record: | |
| - Pose deltas (store to arrays for later). | |
| - Coverage bitsets: e.g., hash voxel indices (`np.floor(world_points / voxel_size)`). | |
| - Quality metrics (mean conf, brightness). | |
| ### 6.3 Job artifacts | |
| - Include selection diagnostics in `selected_frames.json`: | |
| ```json | |
| { | |
| "frame_id": "...", | |
| "motion_distance": 0.45, | |
| "coverage_gain": 0.12, | |
| "decision": "coverage" | |
| } | |
| ``` | |
| - Enables auditing the chosen frames. | |
| ### 6.4 Two-pass pipeline hook | |
| - Add a config flag (e.g., `STREAM3R_KEYFRAME_PREPASS`) to toggle a lightweight pre-pass. | |
| - **Pre-pass steps:** | |
| 1. Collect frames as usual. | |
| 2. Run a reduced inference loop (camera head only or full Stream3R with artifact generation disabled) to gather motion and coverage metadata. | |
| 3. Execute the key-frame selection algorithm to produce selected indices. | |
| - **Main pass:** | |
| 1. Filter `frame_records` to the selected indices. | |
| 2. If the batch size is below a configured maximum, switch inference to full attention; otherwise remain in window mode. | |
| 3. Run the full artifact pipeline (pointmaps, GLB, reinflation) on the reduced set. | |
| 4. Persist selection diagnostics alongside artifacts. | |
| - Provide a fallback path: if the pre-pass fails or returns too few frames, revert to the original sampling strategy so the job still succeeds. | |
| --- | |
| ## 7. Configuration & Defaults | |
| | Setting | Description | Default | | |
| |---------|-------------|---------| | |
| | `STREAM3R_KEYFRAME_MOTION_THRESH` | Motion distance (m) to trigger new key | 0.3 | | |
| | `STREAM3R_KEYFRAME_ROT_THRESH` | Rotation angle (rad) weight | 0.5 | | |
| | `STREAM3R_KEYFRAME_MIN_GAP` | Minimum time gap (s) | 0.25 | | |
| | `STREAM3R_KEYFRAME_MAX_GAP` | Max time between keys (s) | 2.0 | | |
| | `STREAM3R_KEYFRAME_TOP_K` | Max number of key frames | 18 (overridable per payload) | | |
| | `STREAM3R_KEYFRAME_MIN_GAIN` | Coverage gain stop threshold | 0.01 | | |
| | `STREAM3R_KEYFRAME_CONF_THRESH` | Confidence threshold for coverage | 0.3 | | |
| --- | |
| ## 8. Validation Plan | |
| 1. **Quantitative** | |
| - Compare key frame counts vs. baseline (2 fps sampling). | |
| - Measure point coverage retention (% of original points represented by key frames). | |
| - Evaluate overlap with heuristic linear sampling (should be reduced). | |
| 2. **Qualitative** | |
| - Visual inspection: point cloud clutter reduction, better 2D scene graph clarity. | |
| - Spot-check key-frame artifacts (diagnostic metadata) to ensure decisions align with expectations. | |
| 3. **Performance** | |
| - Ensure coverage computations remain efficient (hash-based; track memory usage). | |
| - Add timing logs in `_compute_selected_frames`. | |
| --- | |
| ## 9. Future Extensions | |
| - Integrate image-content heuristics (entropy, saliency) into coverage scoring. | |
| - Multi-pass selection: first ensure 360° orientation coverage, then fill gaps. | |
| - Adaptive budgets based on room size / path length (use total motion distance). | |
| - Optionally, trigger reinflation of selected frames only for visualization. | |
| --- | |
| **Deliverables** | |
| 1. Updated `_compute_selected_frames` with motion + coverage logic. | |
| 2. Supporting utilities for pose distance and coverage hashing. | |
| 3. Config hooks & optional environment variables. | |
| 4. Tests covering edge cases (no motion, tiny coverage gains, payload `top_k` override). | |
| 5. Documentation updates describing new behavior and tuning knobs. | |
| --- | |
| ``` | |