Spaces:

dwellbot
/

dwellbot_stream3r

Configuration error

App Files Files Community

brian4dwell commited on Nov 5, 2025

Commit

de1bede

1 Parent(s): 1aa7485

keyframe selection

Browse files

Files changed (6) hide show

design_docs/keyframe_selection_motion_coverage.md +195 -0
stream3r/utils/__pycache__/visual_utils.cpython-311.pyc +0 -0
stream3r/utils/visual_utils.py +21 -0
stream3r/worker/config.py +44 -0
stream3r/worker/tasks.py +315 -2
tests/test_voxel_reduction.py +45 -0

design_docs/keyframe_selection_motion_coverage.md ADDED Viewed

	@@ -0,0 +1,195 @@

+```markdown
+# Design Doc: Motion- & Coverage-Aware Key Frame Selection
+**Author:** Brian Clark
+**Last Updated:** 2025-11-07
+**Target Components:** `_compute_selected_frames`, Stream3R inference outputs
+**Goal:** Replace naive FPS sampling with a strategy that keeps only frames providing new camera poses and meaningful scene coverage, reducing point-cloud clutter and improving 2D scene graphs.
+---
+## 1. Overview
+We combine two complementary signals:
+1. **Motion-aware downsampling (Option A):** ensure key frames are spaced by actual camera movement (SE(3) distance), not just time.
+2. **Coverage-driven selection (Option B):** prefer frames that contribute new high-confidence geometry after Stream3R processing.
+The final key frame list is built by enforcing motion diversity first, then greedily adding frames with the largest uncovered coverage gain until we reach a target budget.
+---
+## 2. Inputs & Prerequisites
+- Per-frame camera extrinsics (`extrinsic`) from Stream3R.
+- Optional per-frame quality metrics (blur/confidence) from camera head.
+- Stream3R `world_points` and `world_points_conf` (or post-voxel-reduction point maps) to evaluate coverage.
+- Library support: NumPy + SciPy (for SE(3) distances), optional Open3D or custom KD-tree for point coverage.
+---
+## 3. Motion Metrics (Option A)
+### 3.1 Pose difference
+- Compute translation delta: `||t_i - t_j||`.
+- Compute rotation delta: angle of `R_i * R_j^{-1}` via `acos((trace - 1) / 2)`.
+- Combine with weights (e.g., `motion = w_t * Δpos + w_r * Δrot`), with defaults `w_t=1.0`, `w_r=0.5 m/rad`.
+### 3.2 Greedy spacing (temporal pass)
+1. Initialize with first frame as key.
+2. For each subsequent frame:
+   - Accumulate motion distance from last key (sum of per-frame deltas).
+   - If distance ≥ `motion_threshold` OR time since last key ≥ `max_gap`, mark as key.
+   - Optional: enforce minimum gap (`min_gap_time`) to avoid bursty picks.
+3. Result: `motion_keys` – baseline set with adequate pose coverage.
+### 3.3 Quality gating (optional)
+- Discard frames with low focus / brightness (if metadata available).
+- Use confidence summary (mean `world_points_conf`) to veto worst frames before motion selection.
+---
+## 4. Coverage Metrics (Option B)
+### 4.1 Coverage data
+- For each frame, gather the subset of point cloud indices it contributes above a confidence threshold.
+  - Option 1: Use raw `world_points_conf` mask per frame.
+  - Option 2: After voxel reduction, store voxel IDs touched by each frame (during inference loop).
+### 4.2 Greedy coverage selection
+1. Start with `coverage_keys = []`, `covered = set()`.
+2. For each candidate frame (ordered by motion selection or confidence):
+   - Compute `gain = new_points / total_points`, where `new_points = {points not in covered}`.
+   - Keep a priority queue sorted by gain (breaking ties via motion distance or confidence).
+3. While `coverage_keys` size < desired target (`top_k` or auto budget):
+   - Pop frame with highest gain.
+   - Add to `coverage_keys` and update `covered`.
+   - Recompute gains lazily or maintain stored values (since coverage shrinks).
+4. Merge with `motion_keys`: `selected = sorted(motion_keys ∪ coverage_keys)` preserving chronological order.
+### 4.3 Parameters
+| Parameter | Purpose | Default |
+|-----------|---------|---------|
+| `coverage_conf_thres` | Minimum confidence per point | 0.3 |
+| `top_k` | Max key frames (if >0) | Provided payload |
+| `auto_budget_seconds` | If `top_k` not set, target frames per scene duration | 0.4 fps (≈12 frames for 30 s) |
+| `min_gain_ratio` | Stop if marginal gain < threshold | 0.01 |
+---
+## 5. Algorithm Outline
+```text
+1. Precompute per-frame metadata:
+   - Motion deltas & cumulative distance
+   - Frame quality/confidence
+   - Coverage contributions (voxel IDs or hashed points)
+2. Motion pass:
+   motion_keys = greedy_motion_selection(frames, motion_threshold, min_gap, max_gap)
+3. Coverage pass:
+   candidates = frames filtered by quality & (if large scenes) downsampled using motion_keys as seeds
+   coverage_keys = greedy_coverage_selection(candidates, contributions, budget)
+4. Combine & finalize:
+   selected = sort(unique(motion_keys ∪ coverage_keys))
+   if len(selected) > budget: prune lowest coverage gain while keeping motion anchors
+   collect metadata (confidence, motion distance, coverage gain) for diagnostics
+5. Optional reinflation pass (if enabled) to restore splat density for the selected frames only.
+6. Emit diagnostics in `selected_frames.json`.
+```
+---
+## 6. Integration Points
+### 6.1 `_compute_selected_frames`
+- Extend signature to accept:
+  - `frame_records` (already present)
+  - `extrinsics`, `world_points`, `world_points_conf`
+  - optional `confidence_summary`, `frame_timestamps`
+- Return list of dicts with fields: `frame_id`, `motion_score`, `coverage_gain`, `cum_motion`, etc., so the artifact can explain the reasoning.
+### 6.2 Inference loop
+- While iterating frames, record:
+  - Pose deltas (store to arrays for later).
+  - Coverage bitsets: e.g., hash voxel indices (`np.floor(world_points / voxel_size)`).
+  - Quality metrics (mean conf, brightness).
+### 6.3 Job artifacts
+- Include selection diagnostics in `selected_frames.json`:
+  ```json
+  {
+    "frame_id": "...",
+    "motion_distance": 0.45,
+    "coverage_gain": 0.12,
+    "decision": "coverage"
+  }
+  ```
+- Enables auditing the chosen frames.
+### 6.4 Two-pass pipeline hook
+- Add a config flag (e.g., `STREAM3R_KEYFRAME_PREPASS`) to toggle a lightweight pre-pass.
+- **Pre-pass steps:**
+  1. Collect frames as usual.
+  2. Run a reduced inference loop (camera head only or full Stream3R with artifact generation disabled) to gather motion and coverage metadata.
+  3. Execute the key-frame selection algorithm to produce selected indices.
+- **Main pass:**
+  1. Filter `frame_records` to the selected indices.
+  2. If the batch size is below a configured maximum, switch inference to full attention; otherwise remain in window mode.
+  3. Run the full artifact pipeline (pointmaps, GLB, reinflation) on the reduced set.
+  4. Persist selection diagnostics alongside artifacts.
+- Provide a fallback path: if the pre-pass fails or returns too few frames, revert to the original sampling strategy so the job still succeeds.
+---
+## 7. Configuration & Defaults
+| Setting | Description | Default |
+|---------|-------------|---------|
+| `STREAM3R_KEYFRAME_MOTION_THRESH` | Motion distance (m) to trigger new key | 0.3 |
+| `STREAM3R_KEYFRAME_ROT_THRESH` | Rotation angle (rad) weight | 0.5 |
+| `STREAM3R_KEYFRAME_MIN_GAP` | Minimum time gap (s) | 0.25 |
+| `STREAM3R_KEYFRAME_MAX_GAP` | Max time between keys (s) | 2.0 |
+| `STREAM3R_KEYFRAME_TOP_K` | Max number of key frames | 18 (overridable per payload) |
+| `STREAM3R_KEYFRAME_MIN_GAIN` | Coverage gain stop threshold | 0.01 |
+| `STREAM3R_KEYFRAME_CONF_THRESH` | Confidence threshold for coverage | 0.3 |
+---
+## 8. Validation Plan
+1. **Quantitative**
+   - Compare key frame counts vs. baseline (2 fps sampling).
+   - Measure point coverage retention (% of original points represented by key frames).
+   - Evaluate overlap with heuristic linear sampling (should be reduced).
+2. **Qualitative**
+   - Visual inspection: point cloud clutter reduction, better 2D scene graph clarity.
+   - Spot-check key-frame artifacts (diagnostic metadata) to ensure decisions align with expectations.
+3. **Performance**
+   - Ensure coverage computations remain efficient (hash-based; track memory usage).
+   - Add timing logs in `_compute_selected_frames`.
+---
+## 9. Future Extensions
+- Integrate image-content heuristics (entropy, saliency) into coverage scoring.
+- Multi-pass selection: first ensure 360° orientation coverage, then fill gaps.
+- Adaptive budgets based on room size / path length (use total motion distance).
+- Optionally, trigger reinflation of selected frames only for visualization.
+---
+**Deliverables**
+1. Updated `_compute_selected_frames` with motion + coverage logic.
+2. Supporting utilities for pose distance and coverage hashing.
+3. Config hooks & optional environment variables.
+4. Tests covering edge cases (no motion, tiny coverage gains, payload `top_k` override).
+5. Documentation updates describing new behavior and tuning knobs.
+---
+```

stream3r/utils/__pycache__/visual_utils.cpython-311.pyc CHANGED Viewed

Binary files a/stream3r/utils/__pycache__/visual_utils.cpython-311.pyc and b/stream3r/utils/__pycache__/visual_utils.cpython-311.pyc differ

stream3r/utils/visual_utils.py CHANGED Viewed

@@ -325,6 +325,8 @@ def predictions_to_glb(
     reinflate_jitter_mode: str = "cube",
     reinflate_jitter_sigma: float = 0.35,
     reinflate_seed: int | None = None,
 ) -> trimesh.Scene:
     """
     Converts predictions to a 3D scene represented as a GLB file.
@@ -360,6 +362,8 @@ def predictions_to_glb(
         reinflate_jitter_mode (str): "cube" (uniform jitter) or "gaussian".
         reinflate_jitter_sigma (float): Jitter strength as a fraction of voxel size.
         reinflate_seed (Optional[int]): RNG seed for deterministic reinflation.
     Returns:
         trimesh.Scene: Processed 3D scene containing point cloud and cameras
@@ -523,6 +527,23 @@ def predictions_to_glb(
     colors_rgb = colors_rgb[conf_mask]
     conf_used = conf[conf_mask]
     if effective_voxel_size is not None and voxel_after_conf and vertices_3d.size:
         before_count = vertices_3d.shape[0]
         vertices_3d, colors_rgb, conf_used = voxel_reduce(

     reinflate_jitter_mode: str = "cube",
     reinflate_jitter_sigma: float = 0.35,
     reinflate_seed: int | None = None,
+    ceiling_percentile: float | None = None,
+    ceiling_margin: float = 0.05,
 ) -> trimesh.Scene:
     """
     Converts predictions to a 3D scene represented as a GLB file.
         reinflate_jitter_mode (str): "cube" (uniform jitter) or "gaussian".
         reinflate_jitter_sigma (float): Jitter strength as a fraction of voxel size.
         reinflate_seed (Optional[int]): RNG seed for deterministic reinflation.
+        ceiling_percentile (Optional[float]): Remove points above this Z percentile (0-100).
+        ceiling_margin (float): Margin subtracted from percentile cutoff (meters).
     Returns:
         trimesh.Scene: Processed 3D scene containing point cloud and cameras
     colors_rgb = colors_rgb[conf_mask]
     conf_used = conf[conf_mask]
+    if ceiling_percentile is not None and vertices_3d.size:
+        try:
+            percentile_value = float(ceiling_percentile)
+        except (TypeError, ValueError):
+            percentile_value = None
+        if percentile_value is not None and 0.0 < percentile_value < 100.0:
+            cutoff = float(np.percentile(vertices_3d[:, 2], percentile_value))
+            margin = float(max(0.0, ceiling_margin))
+            threshold = cutoff - margin
+            keep_mask = vertices_3d[:, 2] < threshold
+            if not np.any(keep_mask):
+                keep_mask = vertices_3d[:, 2] <= cutoff
+            if np.any(keep_mask) and np.count_nonzero(keep_mask) < vertices_3d.shape[0]:
+                vertices_3d = vertices_3d[keep_mask]
+                colors_rgb = colors_rgb[keep_mask]
+                conf_used = conf_used[keep_mask]
     if effective_voxel_size is not None and voxel_after_conf and vertices_3d.size:
         before_count = vertices_3d.shape[0]
         vertices_3d, colors_rgb, conf_used = voxel_reduce(

stream3r/worker/config.py CHANGED Viewed

@@ -118,6 +118,17 @@ class WorkerSettings:
     max_frames_per_job: int = 0
     default_job_timeout: int = 45 * 60
     upload_session_cache: bool = True
     @classmethod
     def from_env(cls) -> "WorkerSettings":
@@ -216,6 +227,39 @@ class WorkerSettings:
             "upload_session_cache": _env_bool(
                 "STREAM3R_UPLOAD_CACHE", base.upload_session_cache
             ),
         }
         return cls(**kwargs)

     max_frames_per_job: int = 0
     default_job_timeout: int = 45 * 60
     upload_session_cache: bool = True
+    keyframe_prepass_enabled: bool = True
+    keyframe_motion_threshold: float = 0.4
+    keyframe_rotation_weight: float = 0.5
+    keyframe_min_gap_frames: int = 2
+    keyframe_max_gap_frames: int = 45
+    keyframe_default_top_k: int = 16
+    keyframe_coverage_confidence: float = 0.3
+    keyframe_coverage_voxel_size: float = 0.05
+    keyframe_coverage_max_points: int = 5000
+    keyframe_min_gain_ratio: float = 0.01
+    keyframe_full_mode_max_frames: int = 16
     @classmethod
     def from_env(cls) -> "WorkerSettings":
             "upload_session_cache": _env_bool(
                 "STREAM3R_UPLOAD_CACHE", base.upload_session_cache
             ),
+            "keyframe_prepass_enabled": _env_bool(
+                "STREAM3R_KEYFRAME_PREPASS", base.keyframe_prepass_enabled
+            ),
+            "keyframe_motion_threshold": float(
+                os.getenv("STREAM3R_KEYFRAME_MOTION_THRESH", base.keyframe_motion_threshold)
+            ),
+            "keyframe_rotation_weight": float(
+                os.getenv("STREAM3R_KEYFRAME_ROT_WEIGHT", base.keyframe_rotation_weight)
+            ),
+            "keyframe_min_gap_frames": _env_int(
+                "STREAM3R_KEYFRAME_MIN_GAP_FRAMES", base.keyframe_min_gap_frames
+            ),
+            "keyframe_max_gap_frames": _env_int(
+                "STREAM3R_KEYFRAME_MAX_GAP_FRAMES", base.keyframe_max_gap_frames
+            ),
+            "keyframe_default_top_k": _env_int(
+                "STREAM3R_KEYFRAME_TOP_K", base.keyframe_default_top_k
+            ),
+            "keyframe_coverage_confidence": float(
+                os.getenv("STREAM3R_KEYFRAME_CONF_THRESH", base.keyframe_coverage_confidence)
+            ),
+            "keyframe_coverage_voxel_size": float(
+                os.getenv("STREAM3R_KEYFRAME_VOXEL_SIZE", base.keyframe_coverage_voxel_size)
+            ),
+            "keyframe_coverage_max_points": _env_int(
+                "STREAM3R_KEYFRAME_MAX_POINTS", base.keyframe_coverage_max_points
+            ),
+            "keyframe_min_gain_ratio": float(
+                os.getenv("STREAM3R_KEYFRAME_MIN_GAIN", base.keyframe_min_gain_ratio)
+            ),
+            "keyframe_full_mode_max_frames": _env_int(
+                "STREAM3R_KEYFRAME_FULL_MAX_FRAMES", base.keyframe_full_mode_max_frames
+            ),
         }
         return cls(**kwargs)

stream3r/worker/tasks.py CHANGED Viewed

@@ -63,6 +63,13 @@ class FrameRecord:
     metadata: dict[str, Any] = field(default_factory=dict)
 class ProgressTracker:
     """Aggregates frame progress to percentage updates."""
@@ -542,6 +549,206 @@ def _write_selected_frames(
     return runtime.storage.upload_file(local_file, key, content_type="application/json")
 def _compute_selected_frames(
     predictions: Mapping[str, np.ndarray],
     frame_records: list[FrameRecord],
@@ -567,6 +774,44 @@ def _compute_selected_frames(
     return result
 def _save_scene_glb(
     *,
     runtime: WorkerRuntime,
@@ -576,6 +821,17 @@ def _save_scene_glb(
     payload: Mapping[str, Any],
 ) -> str:
     local_file = temp_dir / runtime.settings.scene_glb_filename
     scene = predictions_to_glb(
         dict(predictions),
         conf_thres=float(payload.get("conf_thres", 3.0)),
@@ -586,6 +842,8 @@ def _save_scene_glb(
         mask_sky=_as_bool(payload.get("mask_sky"), False),
         target_dir=str(temp_dir),
         prediction_mode=payload.get("prediction_mode", "Predicted Pointmap"),
     )
     scene.export(file_obj=str(local_file))
     key = runtime.storage.build_key(
@@ -760,6 +1018,22 @@ def _handle_pose_pointmap(
         "frames": core["frames"],
     }
     result_url = _upload_result_record(
         runtime=runtime,
         scene_id=scene_id,
@@ -894,6 +1168,40 @@ def _execute_job(job_type: str, payload: Mapping[str, Any], handler: JobHandler)
                 temp_path = Path(tmp_dir)
                 frame_records = _collect_frames(runtime, scene_id, payload, temp_path)
                 log_progress(f"collected frames ({len(frame_records)} items)")
                 cache_path = temp_path / runtime.settings.session_cache_filename if streaming else None
                 tracker = ProgressTracker(runtime, job_meta)
@@ -1022,8 +1330,13 @@ def _handle_model_build(
     artifacts = dict(core["artifacts"])
-    top_k = _as_int(payload.get("top_k_frames") or payload.get("top_k"), 0)
-    selected_frames = _compute_selected_frames(predictions, frame_records, top_k)
     selected_frames_url = _write_selected_frames(
         runtime=runtime,
         scene_id=scene_id,

     metadata: dict[str, Any] = field(default_factory=dict)
+@dataclass(slots=True)
+class KeyframeSelectionResult:
+    indices: list[int]
+    diagnostics: list[dict[str, Any]]
+    top_k: int
 class ProgressTracker:
     """Aggregates frame progress to percentage updates."""
     return runtime.storage.upload_file(local_file, key, content_type="application/json")
+def _camera_poses(extrinsic: np.ndarray) -> tuple[np.ndarray, np.ndarray]:
+    matrices = np.asarray(extrinsic, dtype=np.float64)
+    if matrices.ndim != 3 or matrices.shape[1:] != (3, 4):
+        raise ValueError("Extrinsic array must have shape (N, 3, 4)")
+    count = matrices.shape[0]
+    rotations = np.empty((count, 3, 3), dtype=np.float64)
+    translations = np.empty((count, 3), dtype=np.float64)
+    for idx in range(count):
+        mat = np.eye(4, dtype=np.float64)
+        mat[:3, :4] = matrices[idx]
+        cam_to_world = np.linalg.inv(mat)
+        rotations[idx] = cam_to_world[:3, :3]
+        translations[idx] = cam_to_world[:3, 3]
+    return rotations, translations
+def _compute_motion_deltas(rotations: np.ndarray, translations: np.ndarray, rot_weight: float) -> np.ndarray:
+    count = rotations.shape[0]
+    deltas = np.zeros(count, dtype=np.float64)
+    if count <= 1:
+        return deltas
+    for idx in range(1, count):
+        delta_t = np.linalg.norm(translations[idx] - translations[idx - 1])
+        rel = rotations[idx - 1].T @ rotations[idx]
+        trace = np.clip((np.trace(rel) - 1.0) / 2.0, -1.0, 1.0)
+        delta_r = float(np.arccos(trace))
+        deltas[idx] = delta_t + rot_weight * delta_r
+    return deltas
+def _hash_quantized_voxels(coords: np.ndarray) -> np.ndarray:
+    coords = coords.astype(np.int64, copy=False)
+    primes = np.array([73856093, 19349663, 83492791], dtype=np.int64)
+    return coords @ primes
+def _frame_voxel_sets(
+    world_points: np.ndarray,
+    confidence: np.ndarray,
+    *,
+    threshold: float,
+    voxel_size: float,
+    max_points: int,
+) -> tuple[list[set[int]], int]:
+    rng = np.random.default_rng(42)
+    frames = world_points.shape[0]
+    voxel_sets: list[set[int]] = []
+    global_union: set[int] = set()
+    if voxel_size <= 0.0:
+        return [set() for _ in range(frames)], 0
+    for idx in range(frames):
+        conf_frame = confidence[idx]
+        mask = conf_frame >= threshold
+        if not np.any(mask):
+            voxel_sets.append(set())
+            continue
+        points = world_points[idx][mask]
+        if points.shape[0] > max_points:
+            sample_idx = rng.choice(points.shape[0], max_points, replace=False)
+            points = points[sample_idx]
+        quantized = np.floor(points / voxel_size).astype(np.int64, copy=False)
+        hashes = np.unique(_hash_quantized_voxels(quantized))
+        voxel_set = set(int(v) for v in hashes.tolist())
+        voxel_sets.append(voxel_set)
+        global_union.update(voxel_set)
+    return voxel_sets, len(global_union)
+def _select_motion_indices(
+    motion_deltas: np.ndarray,
+    *,
+    threshold: float,
+    min_gap: int,
+    max_gap: int,
+) -> tuple[list[int], dict[int, dict[str, float]]]:
+    total_frames = motion_deltas.shape[0]
+    if total_frames == 0:
+        return [], {}
+    selected = [0]
+    diagnostics: dict[int, dict[str, float]] = {0: {"motion_delta": 0.0, "cum_motion": 0.0}}
+    cumulative = 0.0
+    gap = 0
+    for idx in range(1, total_frames):
+        delta = float(motion_deltas[idx])
+        cumulative += delta
+        gap += 1
+        if gap < max(1, min_gap):
+            continue
+        should_select = cumulative >= threshold
+        if max_gap > 0 and gap >= max_gap:
+            should_select = True
+        if should_select:
+            selected.append(idx)
+            diagnostics[idx] = {"motion_delta": delta, "cum_motion": cumulative}
+            cumulative = 0.0
+            gap = 0
+    if selected[-1] != total_frames - 1:
+        selected.append(total_frames - 1)
+        diagnostics.setdefault(total_frames - 1, {"motion_delta": float(motion_deltas[-1]), "cum_motion": cumulative})
+    return selected, diagnostics
+def _select_keyframes_motion_coverage(
+    frame_records: list[FrameRecord],
+    predictions: Mapping[str, np.ndarray],
+    settings: WorkerSettings,
+    requested_top_k: int,
+) -> KeyframeSelectionResult | None:
+    extrinsic = np.asarray(predictions.get("extrinsic"))
+    if extrinsic.size == 0:
+        return None
+    rotations, translations = _camera_poses(extrinsic)
+    motion_deltas = _compute_motion_deltas(rotations, translations, settings.keyframe_rotation_weight)
+    motion_indices, motion_diag = _select_motion_indices(
+        motion_deltas,
+        threshold=settings.keyframe_motion_threshold,
+        min_gap=max(1, settings.keyframe_min_gap_frames),
+        max_gap=max(0, settings.keyframe_max_gap_frames),
+    )
+    total_frames = len(frame_records)
+    confidence = _pose_confidence(predictions)
+    world_points = predictions.get("world_points")
+    if world_points is None:
+        world_points = predictions.get("world_points_from_depth")
+    voxel_sets: list[set[int]] = [set() for _ in range(total_frames)]
+    total_voxels = 0
+    mean_conf = np.zeros(total_frames, dtype=np.float32)
+    if confidence is not None:
+        mean_conf = confidence.reshape(confidence.shape[0], -1).mean(axis=1)
+    if confidence is not None and world_points is not None:
+        voxel_sets, total_voxels = _frame_voxel_sets(
+            np.asarray(world_points),
+            np.asarray(confidence),
+            threshold=settings.keyframe_coverage_confidence,
+            voxel_size=settings.keyframe_coverage_voxel_size,
+            max_points=max(1000, settings.keyframe_coverage_max_points),
+        )
+    total_voxels = max(total_voxels, 1)
+    top_k = requested_top_k if requested_top_k > 0 else settings.keyframe_default_top_k
+    top_k = max(min(top_k, total_frames), len(motion_indices))
+    selected_set: set[int] = set(motion_indices)
+    diagnostics: dict[int, dict[str, Any]] = {}
+    covered: set[int] = set()
+    for idx in motion_indices:
+        gain_count = len(voxel_sets[idx] - covered) if voxel_sets[idx] else 0
+        gain_ratio = gain_count / total_voxels
+        covered.update(voxel_sets[idx])
+        diagnostics[idx] = {
+            "frame_id": frame_records[idx].frame_id,
+            "frame_index": frame_records[idx].index,
+            "reason": "motion",
+            "motion_delta": float(motion_deltas[idx]),
+            "cum_motion": float(motion_diag.get(idx, {}).get("cum_motion", 0.0)),
+            "coverage_gain_ratio": float(gain_ratio),
+            "coverage_gain_count": int(gain_count),
+            "mean_confidence": float(mean_conf[idx]) if confidence is not None else None,
+        }
+    if len(selected_set) < top_k and total_voxels > 0:
+        min_gain_ratio = settings.keyframe_min_gain_ratio
+        remaining = [i for i in range(total_frames) if i not in selected_set and voxel_sets[i]]
+        while remaining and len(selected_set) < top_k:
+            best_idx = -1
+            best_gain = -1
+            best_ratio = -1.0
+            for idx in remaining:
+                gain = len(voxel_sets[idx] - covered)
+                if gain <= 0:
+                    continue
+                ratio = gain / total_voxels
+                if ratio > best_ratio or (np.isclose(ratio, best_ratio) and gain > best_gain):
+                    best_idx = idx
+                    best_gain = gain
+                    best_ratio = ratio
+            if best_idx == -1 or best_ratio < min_gain_ratio:
+                break
+            selected_set.add(best_idx)
+            covered.update(voxel_sets[best_idx])
+            diagnostics[best_idx] = {
+                "frame_id": frame_records[best_idx].frame_id,
+                "frame_index": frame_records[best_idx].index,
+                "reason": "coverage",
+                "motion_delta": float(motion_deltas[best_idx]),
+                "cum_motion": float(motion_diag.get(best_idx, {}).get("cum_motion", 0.0)),
+                "coverage_gain_ratio": float(best_ratio),
+                "coverage_gain_count": int(best_gain),
+                "mean_confidence": float(mean_conf[best_idx]) if confidence is not None else None,
+            }
+            remaining.remove(best_idx)
+    if requested_top_k > 0 and len(selected_set) > requested_top_k:
+        coverage_candidates = [idx for idx in selected_set if diagnostics[idx]["reason"] == "coverage"]
+        coverage_candidates.sort(key=lambda idx: diagnostics[idx].get("coverage_gain_ratio", 0.0))
+        while len(selected_set) > requested_top_k and coverage_candidates:
+            drop_idx = coverage_candidates.pop(0)
+            selected_set.remove(drop_idx)
+            diagnostics.pop(drop_idx, None)
+    final_indices = sorted(selected_set)
+    final_diags = [diagnostics[idx] for idx in final_indices]
+    return KeyframeSelectionResult(indices=final_indices, diagnostics=final_diags, top_k=len(final_indices))
 def _compute_selected_frames(
     predictions: Mapping[str, np.ndarray],
     frame_records: list[FrameRecord],
     return result
+def _run_keyframe_prepass(
+    *,
+    runtime: WorkerRuntime,
+    payload: Mapping[str, Any],
+    frame_records: list[FrameRecord],
+    mode: str,
+    streaming: bool,
+    window_size: int | None,
+) -> KeyframeSelectionResult | None:
+    if len(frame_records) <= 1:
+        return None
+    settings = runtime.settings
+    top_k_payload = _as_int(payload.get("prepass_top_k") or payload.get("top_k_frames") or payload.get("top_k"), 0)
+    try:
+        inference = run_stream3r_inference(
+            runtime=runtime,
+            image_paths=[record.path for record in frame_records],
+            mode=mode,
+            streaming=streaming,
+            cache_output_path=None,
+            progress_cb=None,
+            window_size=window_size if streaming and mode == "window" else None,
+        )
+    except Exception:
+        logger.exception("Keyframe pre-pass inference failed")
+        return None
+    try:
+        selection = _select_keyframes_motion_coverage(
+            frame_records,
+            inference.predictions,
+            settings,
+            requested_top_k=top_k_payload,
+        )
+    finally:
+        del inference
+    return selection
 def _save_scene_glb(
     *,
     runtime: WorkerRuntime,
     payload: Mapping[str, Any],
 ) -> str:
     local_file = temp_dir / runtime.settings.scene_glb_filename
+    ceiling_percentile = payload.get("ceiling_percentile")
+    try:
+        ceiling_percentile_value = float(ceiling_percentile) if ceiling_percentile is not None else None
+    except (TypeError, ValueError):
+        ceiling_percentile_value = None
+    ceiling_margin_value = payload.get("ceiling_margin")
+    try:
+        ceiling_margin_value = float(ceiling_margin_value) if ceiling_margin_value is not None else 0.05
+    except (TypeError, ValueError):
+        ceiling_margin_value = 0.05
     scene = predictions_to_glb(
         dict(predictions),
         conf_thres=float(payload.get("conf_thres", 3.0)),
         mask_sky=_as_bool(payload.get("mask_sky"), False),
         target_dir=str(temp_dir),
         prediction_mode=payload.get("prediction_mode", "Predicted Pointmap"),
+        ceiling_percentile=ceiling_percentile_value,
+        ceiling_margin=ceiling_margin_value,
     )
     scene.export(file_obj=str(local_file))
     key = runtime.storage.build_key(
         "frames": core["frames"],
     }
+    selected_frames_payload = payload.get("_selected_frames_info")
+    if selected_frames_payload:
+        result_payload["selected_frames"] = list(selected_frames_payload)
+        try:
+            selected_frames_url = _write_selected_frames(
+                runtime=runtime,
+                scene_id=scene_id,
+                selected_frames=list(selected_frames_payload),
+                top_k=_as_int(payload.get("_selected_top_k"), len(selected_frames_payload)),
+                temp_dir=temp_dir,
+            )
+            if selected_frames_url:
+                result_payload["artifacts"]["selected_frames_url"] = selected_frames_url
+        except Exception:
+            logger.exception("Failed to persist selected frames artifact for pose_pointmap job")
     result_url = _upload_result_record(
         runtime=runtime,
         scene_id=scene_id,
                 temp_path = Path(tmp_dir)
                 frame_records = _collect_frames(runtime, scene_id, payload, temp_path)
                 log_progress(f"collected frames ({len(frame_records)} items)")
+                selection_result: KeyframeSelectionResult | None = None
+                if runtime.settings.keyframe_prepass_enabled and len(frame_records) > 1:
+                    log_progress("starting keyframe pre-pass")
+                    try:
+                        selection_result = _run_keyframe_prepass(
+                            runtime=runtime,
+                            payload=payload,
+                            frame_records=frame_records,
+                            mode=mode,
+                            streaming=streaming,
+                            window_size=window_size,
+                        )
+                    except Exception:
+                        selection_result = None
+                        logger.exception("Keyframe pre-pass failed; falling back to full frame set")
+                    if selection_result and selection_result.indices:
+                        log_progress(
+                            f"pre-pass selected {len(selection_result.indices)} frames from {len(frame_records)}"
+                        )
+                        frame_records = [frame_records[i] for i in selection_result.indices]
+                        for new_idx, record in enumerate(frame_records):
+                            record.index = new_idx
+                        payload["_selected_frames_info"] = selection_result.diagnostics
+                        payload["_selected_top_k"] = selection_result.top_k
+                        payload["_selected_frame_indices"] = selection_result.indices
+                        if len(frame_records) <= runtime.settings.keyframe_full_mode_max_frames:
+                            mode = "full"
+                            streaming = False
+                            window_size = None
+                            payload["mode"] = mode
+                            payload["streaming"] = streaming
+                    else:
+                        selection_result = None
                 cache_path = temp_path / runtime.settings.session_cache_filename if streaming else None
                 tracker = ProgressTracker(runtime, job_meta)
     artifacts = dict(core["artifacts"])
+    selected_frames_payload = payload.get("_selected_frames_info")
+    if selected_frames_payload:
+        top_k = _as_int(payload.get("_selected_top_k"), len(selected_frames_payload))
+        selected_frames = list(selected_frames_payload)
+    else:
+        top_k = _as_int(payload.get("top_k_frames") or payload.get("top_k"), 0)
+        selected_frames = _compute_selected_frames(predictions, frame_records, top_k)
     selected_frames_url = _write_selected_frames(
         runtime=runtime,
         scene_id=scene_id,

tests/test_voxel_reduction.py CHANGED Viewed

@@ -177,3 +177,48 @@ def test_density_filter_points_removes_isolated_samples():
     assert filtered_points.shape[0] < points.shape[0]
     assert np.all(filtered_points.max(axis=0) < 0.2)

     assert filtered_points.shape[0] < points.shape[0]
     assert np.all(filtered_points.max(axis=0) < 0.2)
+def test_predictions_to_glb_ceiling_filter():
+    world_points = np.array(
+        [
+            [
+                [[0.0, 0.0, 0.0], [0.0, 0.0, 1.5]],
+                [[0.0, 0.0, 1.6], [0.0, 0.0, 1.7]],
+            ]
+        ],
+        dtype=np.float32,
+    )
+    predictions = {
+        "world_points": world_points,
+        "world_points_conf": np.ones((1, 2, 2), dtype=np.float32),
+        "world_points_from_depth": world_points,
+        "depth_conf": np.ones((1, 2, 2), dtype=np.float32),
+        "images": np.ones((1, 2, 2, 3), dtype=np.float32) * 0.5,
+        "extrinsic": np.array(
+            [
+                [
+                    [1.0, 0.0, 0.0, 0.0],
+                    [0.0, 1.0, 0.0, 0.0],
+                    [0.0, 0.0, 1.0, 0.0],
+                ]
+            ],
+            dtype=np.float32,
+        ),
+    }
+    scene = predictions_to_glb(
+        predictions,
+        conf_thres=0.0,
+        voxel_size=None,
+        o3d_denoise=False,
+        density_filter=False,
+        reinflate_enabled=False,
+        ceiling_percentile=90.0,
+        ceiling_margin=0.05,
+    )
+    assert isinstance(scene, trimesh.Scene)
+    point_cloud = next(iter(scene.geometry.values()))
+    max_z = point_cloud.vertices[:, 2].max()
+    assert max_z < 1.6