Spaces:

dwellbot
/

dwellbot_stream3r

Configuration error

File size: 8,262 Bytes

de1bede

```markdown
# Design Doc: Motion- & Coverage-Aware Key Frame Selection

**Author:** Brian Clark  
**Last Updated:** 2025-11-07  
**Target Components:** `_compute_selected_frames`, Stream3R inference outputs  
**Goal:** Replace naive FPS sampling with a strategy that keeps only frames providing new camera poses and meaningful scene coverage, reducing point-cloud clutter and improving 2D scene graphs.

---

## 1. Overview

We combine two complementary signals:

1. **Motion-aware downsampling (Option A):** ensure key frames are spaced by actual camera movement (SE(3) distance), not just time.
2. **Coverage-driven selection (Option B):** prefer frames that contribute new high-confidence geometry after Stream3R processing.

The final key frame list is built by enforcing motion diversity first, then greedily adding frames with the largest uncovered coverage gain until we reach a target budget.

---

## 2. Inputs & Prerequisites

- Per-frame camera extrinsics (`extrinsic`) from Stream3R.
- Optional per-frame quality metrics (blur/confidence) from camera head.
- Stream3R `world_points` and `world_points_conf` (or post-voxel-reduction point maps) to evaluate coverage.
- Library support: NumPy + SciPy (for SE(3) distances), optional Open3D or custom KD-tree for point coverage.

---

## 3. Motion Metrics (Option A)

### 3.1 Pose difference
- Compute translation delta: `||t_i - t_j||`.
- Compute rotation delta: angle of `R_i * R_j^{-1}` via `acos((trace - 1) / 2)`.
- Combine with weights (e.g., `motion = w_t * Δpos + w_r * Δrot`), with defaults `w_t=1.0`, `w_r=0.5 m/rad`.

### 3.2 Greedy spacing (temporal pass)
1. Initialize with first frame as key.
2. For each subsequent frame:
   - Accumulate motion distance from last key (sum of per-frame deltas).
   - If distance ≥ `motion_threshold` OR time since last key ≥ `max_gap`, mark as key.
   - Optional: enforce minimum gap (`min_gap_time`) to avoid bursty picks.
3. Result: `motion_keys` – baseline set with adequate pose coverage.

### 3.3 Quality gating (optional)
- Discard frames with low focus / brightness (if metadata available).
- Use confidence summary (mean `world_points_conf`) to veto worst frames before motion selection.

---

## 4. Coverage Metrics (Option B)

### 4.1 Coverage data
- For each frame, gather the subset of point cloud indices it contributes above a confidence threshold.
  - Option 1: Use raw `world_points_conf` mask per frame.
  - Option 2: After voxel reduction, store voxel IDs touched by each frame (during inference loop).

### 4.2 Greedy coverage selection
1. Start with `coverage_keys = []`, `covered = set()`.
2. For each candidate frame (ordered by motion selection or confidence):
   - Compute `gain = new_points / total_points`, where `new_points = {points not in covered}`.
   - Keep a priority queue sorted by gain (breaking ties via motion distance or confidence).
3. While `coverage_keys` size < desired target (`top_k` or auto budget):
   - Pop frame with highest gain.
   - Add to `coverage_keys` and update `covered`.
   - Recompute gains lazily or maintain stored values (since coverage shrinks).
4. Merge with `motion_keys`: `selected = sorted(motion_keys ∪ coverage_keys)` preserving chronological order.

### 4.3 Parameters
| Parameter | Purpose | Default |
|-----------|---------|---------|
| `coverage_conf_thres` | Minimum confidence per point | 0.3 |
| `top_k` | Max key frames (if >0) | Provided payload |
| `auto_budget_seconds` | If `top_k` not set, target frames per scene duration | 0.4 fps (≈12 frames for 30 s) |
| `min_gain_ratio` | Stop if marginal gain < threshold | 0.01 |

---

## 5. Algorithm Outline

```text
1. Precompute per-frame metadata:
   - Motion deltas & cumulative distance
   - Frame quality/confidence
   - Coverage contributions (voxel IDs or hashed points)

2. Motion pass:
   motion_keys = greedy_motion_selection(frames, motion_threshold, min_gap, max_gap)

3. Coverage pass:
   candidates = frames filtered by quality & (if large scenes) downsampled using motion_keys as seeds
   coverage_keys = greedy_coverage_selection(candidates, contributions, budget)

4. Combine & finalize:
   selected = sort(unique(motion_keys ∪ coverage_keys))
   if len(selected) > budget: prune lowest coverage gain while keeping motion anchors
   collect metadata (confidence, motion distance, coverage gain) for diagnostics

5. Optional reinflation pass (if enabled) to restore splat density for the selected frames only.

6. Emit diagnostics in `selected_frames.json`.
```

---

## 6. Integration Points

### 6.1 `_compute_selected_frames`
- Extend signature to accept:
  - `frame_records` (already present)
  - `extrinsics`, `world_points`, `world_points_conf`
  - optional `confidence_summary`, `frame_timestamps`
- Return list of dicts with fields: `frame_id`, `motion_score`, `coverage_gain`, `cum_motion`, etc., so the artifact can explain the reasoning.

### 6.2 Inference loop
- While iterating frames, record:
  - Pose deltas (store to arrays for later).
  - Coverage bitsets: e.g., hash voxel indices (`np.floor(world_points / voxel_size)`).
  - Quality metrics (mean conf, brightness).

### 6.3 Job artifacts
- Include selection diagnostics in `selected_frames.json`:
  ```json
  {
    "frame_id": "...",
    "motion_distance": 0.45,
    "coverage_gain": 0.12,
    "decision": "coverage"
  }
  ```
- Enables auditing the chosen frames.

### 6.4 Two-pass pipeline hook
- Add a config flag (e.g., `STREAM3R_KEYFRAME_PREPASS`) to toggle a lightweight pre-pass.
- **Pre-pass steps:**
  1. Collect frames as usual.
  2. Run a reduced inference loop (camera head only or full Stream3R with artifact generation disabled) to gather motion and coverage metadata.
  3. Execute the key-frame selection algorithm to produce selected indices.
- **Main pass:**
  1. Filter `frame_records` to the selected indices.
  2. If the batch size is below a configured maximum, switch inference to full attention; otherwise remain in window mode.
  3. Run the full artifact pipeline (pointmaps, GLB, reinflation) on the reduced set.
  4. Persist selection diagnostics alongside artifacts.
- Provide a fallback path: if the pre-pass fails or returns too few frames, revert to the original sampling strategy so the job still succeeds.

---

## 7. Configuration & Defaults

| Setting | Description | Default |
|---------|-------------|---------|
| `STREAM3R_KEYFRAME_MOTION_THRESH` | Motion distance (m) to trigger new key | 0.3 |
| `STREAM3R_KEYFRAME_ROT_THRESH` | Rotation angle (rad) weight | 0.5 |
| `STREAM3R_KEYFRAME_MIN_GAP` | Minimum time gap (s) | 0.25 |
| `STREAM3R_KEYFRAME_MAX_GAP` | Max time between keys (s) | 2.0 |
| `STREAM3R_KEYFRAME_TOP_K` | Max number of key frames | 18 (overridable per payload) |
| `STREAM3R_KEYFRAME_MIN_GAIN` | Coverage gain stop threshold | 0.01 |
| `STREAM3R_KEYFRAME_CONF_THRESH` | Confidence threshold for coverage | 0.3 |

---

## 8. Validation Plan

1. **Quantitative**
   - Compare key frame counts vs. baseline (2 fps sampling).
   - Measure point coverage retention (% of original points represented by key frames).
   - Evaluate overlap with heuristic linear sampling (should be reduced).
2. **Qualitative**
   - Visual inspection: point cloud clutter reduction, better 2D scene graph clarity.
   - Spot-check key-frame artifacts (diagnostic metadata) to ensure decisions align with expectations.
3. **Performance**
   - Ensure coverage computations remain efficient (hash-based; track memory usage).
   - Add timing logs in `_compute_selected_frames`.

---

## 9. Future Extensions

- Integrate image-content heuristics (entropy, saliency) into coverage scoring.
- Multi-pass selection: first ensure 360° orientation coverage, then fill gaps.
- Adaptive budgets based on room size / path length (use total motion distance).
- Optionally, trigger reinflation of selected frames only for visualization.

---

**Deliverables**
1. Updated `_compute_selected_frames` with motion + coverage logic.
2. Supporting utilities for pose distance and coverage hashing.
3. Config hooks & optional environment variables.
4. Tests covering edge cases (no motion, tiny coverage gains, payload `top_k` override).
5. Documentation updates describing new behavior and tuning knobs.

---
```