Spaces:
Configuration error
Configuration error
File size: 8,262 Bytes
de1bede |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 |
```markdown
# Design Doc: Motion- & Coverage-Aware Key Frame Selection
**Author:** Brian Clark
**Last Updated:** 2025-11-07
**Target Components:** `_compute_selected_frames`, Stream3R inference outputs
**Goal:** Replace naive FPS sampling with a strategy that keeps only frames providing new camera poses and meaningful scene coverage, reducing point-cloud clutter and improving 2D scene graphs.
---
## 1. Overview
We combine two complementary signals:
1. **Motion-aware downsampling (Option A):** ensure key frames are spaced by actual camera movement (SE(3) distance), not just time.
2. **Coverage-driven selection (Option B):** prefer frames that contribute new high-confidence geometry after Stream3R processing.
The final key frame list is built by enforcing motion diversity first, then greedily adding frames with the largest uncovered coverage gain until we reach a target budget.
---
## 2. Inputs & Prerequisites
- Per-frame camera extrinsics (`extrinsic`) from Stream3R.
- Optional per-frame quality metrics (blur/confidence) from camera head.
- Stream3R `world_points` and `world_points_conf` (or post-voxel-reduction point maps) to evaluate coverage.
- Library support: NumPy + SciPy (for SE(3) distances), optional Open3D or custom KD-tree for point coverage.
---
## 3. Motion Metrics (Option A)
### 3.1 Pose difference
- Compute translation delta: `||t_i - t_j||`.
- Compute rotation delta: angle of `R_i * R_j^{-1}` via `acos((trace - 1) / 2)`.
- Combine with weights (e.g., `motion = w_t * Δpos + w_r * Δrot`), with defaults `w_t=1.0`, `w_r=0.5 m/rad`.
### 3.2 Greedy spacing (temporal pass)
1. Initialize with first frame as key.
2. For each subsequent frame:
- Accumulate motion distance from last key (sum of per-frame deltas).
- If distance ≥ `motion_threshold` OR time since last key ≥ `max_gap`, mark as key.
- Optional: enforce minimum gap (`min_gap_time`) to avoid bursty picks.
3. Result: `motion_keys` – baseline set with adequate pose coverage.
### 3.3 Quality gating (optional)
- Discard frames with low focus / brightness (if metadata available).
- Use confidence summary (mean `world_points_conf`) to veto worst frames before motion selection.
---
## 4. Coverage Metrics (Option B)
### 4.1 Coverage data
- For each frame, gather the subset of point cloud indices it contributes above a confidence threshold.
- Option 1: Use raw `world_points_conf` mask per frame.
- Option 2: After voxel reduction, store voxel IDs touched by each frame (during inference loop).
### 4.2 Greedy coverage selection
1. Start with `coverage_keys = []`, `covered = set()`.
2. For each candidate frame (ordered by motion selection or confidence):
- Compute `gain = new_points / total_points`, where `new_points = {points not in covered}`.
- Keep a priority queue sorted by gain (breaking ties via motion distance or confidence).
3. While `coverage_keys` size < desired target (`top_k` or auto budget):
- Pop frame with highest gain.
- Add to `coverage_keys` and update `covered`.
- Recompute gains lazily or maintain stored values (since coverage shrinks).
4. Merge with `motion_keys`: `selected = sorted(motion_keys ∪ coverage_keys)` preserving chronological order.
### 4.3 Parameters
| Parameter | Purpose | Default |
|-----------|---------|---------|
| `coverage_conf_thres` | Minimum confidence per point | 0.3 |
| `top_k` | Max key frames (if >0) | Provided payload |
| `auto_budget_seconds` | If `top_k` not set, target frames per scene duration | 0.4 fps (≈12 frames for 30 s) |
| `min_gain_ratio` | Stop if marginal gain < threshold | 0.01 |
---
## 5. Algorithm Outline
```text
1. Precompute per-frame metadata:
- Motion deltas & cumulative distance
- Frame quality/confidence
- Coverage contributions (voxel IDs or hashed points)
2. Motion pass:
motion_keys = greedy_motion_selection(frames, motion_threshold, min_gap, max_gap)
3. Coverage pass:
candidates = frames filtered by quality & (if large scenes) downsampled using motion_keys as seeds
coverage_keys = greedy_coverage_selection(candidates, contributions, budget)
4. Combine & finalize:
selected = sort(unique(motion_keys ∪ coverage_keys))
if len(selected) > budget: prune lowest coverage gain while keeping motion anchors
collect metadata (confidence, motion distance, coverage gain) for diagnostics
5. Optional reinflation pass (if enabled) to restore splat density for the selected frames only.
6. Emit diagnostics in `selected_frames.json`.
```
---
## 6. Integration Points
### 6.1 `_compute_selected_frames`
- Extend signature to accept:
- `frame_records` (already present)
- `extrinsics`, `world_points`, `world_points_conf`
- optional `confidence_summary`, `frame_timestamps`
- Return list of dicts with fields: `frame_id`, `motion_score`, `coverage_gain`, `cum_motion`, etc., so the artifact can explain the reasoning.
### 6.2 Inference loop
- While iterating frames, record:
- Pose deltas (store to arrays for later).
- Coverage bitsets: e.g., hash voxel indices (`np.floor(world_points / voxel_size)`).
- Quality metrics (mean conf, brightness).
### 6.3 Job artifacts
- Include selection diagnostics in `selected_frames.json`:
```json
{
"frame_id": "...",
"motion_distance": 0.45,
"coverage_gain": 0.12,
"decision": "coverage"
}
```
- Enables auditing the chosen frames.
### 6.4 Two-pass pipeline hook
- Add a config flag (e.g., `STREAM3R_KEYFRAME_PREPASS`) to toggle a lightweight pre-pass.
- **Pre-pass steps:**
1. Collect frames as usual.
2. Run a reduced inference loop (camera head only or full Stream3R with artifact generation disabled) to gather motion and coverage metadata.
3. Execute the key-frame selection algorithm to produce selected indices.
- **Main pass:**
1. Filter `frame_records` to the selected indices.
2. If the batch size is below a configured maximum, switch inference to full attention; otherwise remain in window mode.
3. Run the full artifact pipeline (pointmaps, GLB, reinflation) on the reduced set.
4. Persist selection diagnostics alongside artifacts.
- Provide a fallback path: if the pre-pass fails or returns too few frames, revert to the original sampling strategy so the job still succeeds.
---
## 7. Configuration & Defaults
| Setting | Description | Default |
|---------|-------------|---------|
| `STREAM3R_KEYFRAME_MOTION_THRESH` | Motion distance (m) to trigger new key | 0.3 |
| `STREAM3R_KEYFRAME_ROT_THRESH` | Rotation angle (rad) weight | 0.5 |
| `STREAM3R_KEYFRAME_MIN_GAP` | Minimum time gap (s) | 0.25 |
| `STREAM3R_KEYFRAME_MAX_GAP` | Max time between keys (s) | 2.0 |
| `STREAM3R_KEYFRAME_TOP_K` | Max number of key frames | 18 (overridable per payload) |
| `STREAM3R_KEYFRAME_MIN_GAIN` | Coverage gain stop threshold | 0.01 |
| `STREAM3R_KEYFRAME_CONF_THRESH` | Confidence threshold for coverage | 0.3 |
---
## 8. Validation Plan
1. **Quantitative**
- Compare key frame counts vs. baseline (2 fps sampling).
- Measure point coverage retention (% of original points represented by key frames).
- Evaluate overlap with heuristic linear sampling (should be reduced).
2. **Qualitative**
- Visual inspection: point cloud clutter reduction, better 2D scene graph clarity.
- Spot-check key-frame artifacts (diagnostic metadata) to ensure decisions align with expectations.
3. **Performance**
- Ensure coverage computations remain efficient (hash-based; track memory usage).
- Add timing logs in `_compute_selected_frames`.
---
## 9. Future Extensions
- Integrate image-content heuristics (entropy, saliency) into coverage scoring.
- Multi-pass selection: first ensure 360° orientation coverage, then fill gaps.
- Adaptive budgets based on room size / path length (use total motion distance).
- Optionally, trigger reinflation of selected frames only for visualization.
---
**Deliverables**
1. Updated `_compute_selected_frames` with motion + coverage logic.
2. Supporting utilities for pose distance and coverage hashing.
3. Config hooks & optional environment variables.
4. Tests covering edge cases (no motion, tiny coverage gains, payload `top_k` override).
5. Documentation updates describing new behavior and tuning knobs.
---
```
|