Spaces:

dwellbot
/

dwellbot_stream3r

Configuration error

App Files Files Community

brian4dwell commited on Nov 4, 2025

Commit

31dbf53

1 Parent(s): f1e0138

voxel reduction

Browse files

Files changed (10) hide show

design_docs/post_reduction_meshing.md +120 -0
design_docs/splat_rendering.md +124 -0
design_docs/voxel_reduction.md +212 -0
design_docs/voxel_reinflation.md +106 -0
requirements.txt +1 -0
stream3r/utils/__pycache__/visual_utils.cpython-311.pyc +0 -0
stream3r/utils/visual_utils.py +281 -6
stream3r/worker/config.py +7 -3
stream3r/worker/tasks.py +44 -2
tests/test_voxel_reduction.py +114 -0

design_docs/post_reduction_meshing.md ADDED Viewed

	@@ -0,0 +1,120 @@

+```markdown
+# Design Doc: Post-Reduction Meshing for Clean Surface Export
+**Author:** Brian Clark
+**Last Updated:** 2025-11-07
+**Target Component:** Optional stage after `predictions_to_glb` point processing
+**Goal:** Convert cleaned point clouds into lightweight surface meshes for users who prefer shaded geometry over splatty point clouds.
+---
+## 1. Overview
+After confidence filtering, voxel reduction, and denoising we hold a sparse, high-confidence point cloud.
+This design adds an optional meshing stage (ball-pivoting, Poisson, marching cubes) to produce a triangle mesh that can be exported alongside or instead of the GLB point cloud.
+---
+## 2. Use Cases
+- Interactive viewers requiring solid surfaces (lighting, shadowing).
+- Downstream pipelines that expect meshes (CAD tools, 3D printing previews).
+- Scenes where splatty points look sparse even after reinflation.
+We do **not** aim for watertight printable models—just visually continuous surfaces.
+---
+## 3. Meshing Strategies
+| Method | Pros | Cons | Dependencies |
+| ------ | ---- | ---- | ------------ |
+| **Ball-Pivoting (BPA)** | Stable for uneven sampling, preserves detail | Needs normals, parameter tuning | Open3D |
+| **Poisson Reconstruction** | Smooth surfaces, fills gaps | Blurs thin structures, more compute | Open3D |
+| **Marching Cubes on fused depth** | Works without normals | Requires voxel grid, might need extra fusion step | Open3D / custom |
+Initial plan: start with Open3D’s **Ball-Pivoting**, fall back to Poisson when BPA fails.
+---
+## 4. Pipeline Integration
+1. Existing point-cleaning (voxel, support filtering, optional reinflation).
+2. Normal estimation (Open3D `estimate_normals`, with KD-tree search radius tied to voxel size).
+3. Mesh reconstruction (BPA default, Poisson fallback).
+4. Mesh simplification (`simplify_quadric_decimation`) to hit target face count.
+5. Export as GLB/OBJ/PLY; attach materials/colors using per-vertex colors (sampled from original points).
+### Configuration options
+| Option | Default | Notes |
+| ------ | ------- | ----- |
+| `meshing_enabled` | `False` | Opt-in feature |
+| `meshing_method` | `"ball_pivoting"` | `"poisson"` available |
+| `bpa_radii` | `[voxel_size, 2×, 4×]` | Radii list for BPA |
+| `poisson_depth` | 8 | Tree depth, controls detail |
+| `target_face_count` | 200k | Post-simplification triangle budget |
+| `keep_point_cloud` | `True` | Export both mesh + original cloud |
+---
+## 5. Implementation Plan
+### 5.1 Helper module
+`stream3r/utils/mesh_utils.py`
+```python
+def build_surface_mesh(points, colors, *, config) -> trimesh.Trimesh:
+    # open3d conversions, normal estimation
+    # reconstruction via BPA/Poisson
+    # color baking (nearest neighbor)
+```
+### 5.2 Export integration
+In `_generate_core_outputs` after point cloud saved:
+```python
+if settings.meshing_enabled:
+    mesh = build_surface_mesh(vertices_3d, colors_rgb, config)
+    mesh_url = _save_mesh(runtime, scene_id, mesh, temp_dir)
+    artifacts["mesh_url"] = mesh_url
+```
+### 5.3 Storage
+- Upload under `models/meshes/scene_mesh.glb`
+- Optionally provide OBJ + MTL for compatibility.
+---
+## 6. Validation
+1. Compare GLB mesh size vs. original point cloud.
+2. Manual visual QA (Viewer, Blender) to spot holes or artifacts.
+3. Automated checks:
+   - Mesh exists and contains triangles.
+   - Vertex count under budget.
+   - Color channels preserved (mean deviation < ε).
+4. Benchmark runtime per scene and ensure it fits within job timeouts.
+---
+## 7. Risks & Mitigations
+| Risk | Mitigation |
+| ---- | ---------- |
+| Thin structures lost | Tune BPA radii; detect failure and revert to point cloud |
+| Open3D dependency bloat | Gate meshing behind `pip install open3d`; log when unavailable |
+| Runtime overhead | Make stage optional; expose `meshing_timeout` |
+| Large meshes | Apply decimation & optional texture baking |
+---
+## 8. Deliverables
+1. Helper module for meshing + tests.
+2. Scene artifacts update (mesh export, metadata).
+3. New config flags (`STREAM3R_MESHING_ENABLED`, etc.).
+4. Documentation/tutorial for users toggling the mesh output.
+---
+**Outcome:** An optional mesh artifact that gives viewers a solid-looking scene without fully abandoning the point-based pipeline.
+```

design_docs/splat_rendering.md ADDED Viewed

	@@ -0,0 +1,124 @@

+```markdown
+# Design Doc: Splat-Aware Export with Per-Point Radii & View-Dependent Color
+**Author:** Brian Clark
+**Last Updated:** 2025-11-07
+**Target Components:** Export pipeline (`predictions_to_glb`), downstream visualization stack
+**Goal:** Preserve the rich “Gaussian splat” appearance by exporting per-point radius/orientation/color information and using a splat-aware renderer instead of vanilla GLB point clouds.
+---
+## 1. Overview
+The current GLB export stores XYZ + RGB points. Rendering them as fixed-size pixels loses the soft, blended look achieved by overlapping splats in the original predictions.
+This design introduces a splat representation (Gaussian or disk) with explicit radius and optional anisotropy, and changes the rendering path to honor those attributes.
+---
+## 2. Requirements
+1. **Data export**
+   - Include at least:
+     - Center (`x,y,z`)
+     - Radius (`r`) or covariance matrix (`Σ`)
+     - Base color (RGB) and optional view-dependent coefficients (e.g., SH).
+   - Allow fallback to disk splats (isotropic) if covariance unavailable.
+2. **File format**
+   - Options:
+     - Extend GLB with custom vertex attributes + custom shader (three.js, deck.gl).
+     - Use established Gaussian splat formats (e.g., `.splat`, `.ply` with extra attributes, or `gaussian-splatting` binary).
+   - Include metadata describing attribute semantics for the viewer.
+3. **Renderer**
+   - Web: Three.js shader or deck.gl layer capable of drawing splats.
+   - Native: Optionally integrate with existing Gaussian splatting viewers (Instant-NGP, Splatfacto, etc.).
+4. **Performance**
+   - Maintain reasonable splat counts (prefiltered via voxel reduction & density tests).
+   - Provide LOD strategies (e.g., radius-aware culling, multi-resolution export).
+---
+## 3. Proposed Pipeline
+### 3.1 Data preparation
+1. Start with voxel-reduced points.
+2. Estimate per-point covariance:
+   - Option A: Use original support’s covariance (requires storing full neighbor set).
+   - Option B: Compute isotropic radius from average neighbor distance (k-NN) or voxel size.
+3. Store color as linear RGB; optionally store SH coefficients computed during inference (if available).
+### 3.2 Export formats
+| Format | Pros | Cons |
+| ------ | ---- | ---- |
+| **GLB + custom vertex attributes** | Simple integration with existing pipeline | Requires custom shader; not portable |
+| **.splat binary (Gaussian Splatting)** | Compatibility with emerging viewers | New dependency; not widely standardized |
+| **PLY/NPZ with attributes** | Readable, quick prototyping | Requires loader adaptation |
+Initial recommendation: use GLB with custom attributes (e.g., `RADIUS`, `COV3x3`) and provide sample shaders for three.js.
+### 3.3 Rendering changes
+1. Publish a three.js script that:
+   - Loads GLB.
+   - Extracts attributes into buffers.
+   - Renders via custom shader (screen-space splatting).
+2. Provide deck.gl example using `PointCloudLayer` with `radiusPixels` or custom shader module.
+3. Eventually support native Gaussian Splat format for plug-and-play compatibility.
+---
+## 4. API changes
+Add new export options:
+- `export_mode: {"glb", "splat_glb", "gaussian_binary"}`
+- `splat_settings` (radius multiplier, anisotropy toggle, max anisotropy)
+- `lod_strategy` (none, subsample, multi-file)
+Ensure backward compatibility by keeping current GLB path as default.
+---
+## 5. Validation & Testing
+1. Compare visual output in existing viewer vs. splat viewer (visual parity).
+2. Measure load time, FPS for large scenes.
+3. Unit tests for:
+   - Attribute packing/unpacking.
+   - Radius estimation.
+   - Export file integrity.
+4. Integration test: pipeline -> viewer round-trip (screenshot diff, metrics).
+---
+## 6. Risks & Mitigations
+| Risk | Mitigation |
+| ---- | ---------- |
+| Custom GLB attributes unsupported by some tools | Provide fallback GLB path, document requirements |
+| Increased file size due to extra attributes | Leverage quantization/compression, allow isotropic mode |
+| Viewer complexity | Ship reference shader + deck.gl layer; adopt existing open-source splat renderer |
+| Lack of covariance data | Start with isotropic radii derived from voxel size |
+---
+## 7. Timeline (rough)
+1. Week 1: Data prep & attribute computation (radius/covariance).
+2. Week 2: GLB exporter modifications + tests.
+3. Week 3: Viewer shader integration, documentation.
+4. Week 4: Optional Gaussian binary export + performance tuning.
+---
+## 8. Deliverables
+1. Updated export pipeline producing splat-aware asset.
+2. Viewer example repo (three.js + deck.gl).
+3. Documentation covering format, tuning knobs, and integration steps.
+4. Automated tests validating attribute correctness & rendering.
+---
+```

design_docs/voxel_reduction.md ADDED Viewed

	@@ -0,0 +1,212 @@

+```markdown
+# Design Doc: Stream3R → Clean GLB Export with Voxel Reduction + Open3D Outlier Removal
+**Author:** Brian Clark
+**Last Updated:** 2025-11-03
+**Target Component:** `predictions_to_glb()` (Stream3R repo)
+**Goal:** Integrate a high-fidelity voxel reduction and denoising stage for cleaner, lighter `.glb` outputs suitable for downstream r3f visualization.
+---
+## 1. Overview
+The existing `predictions_to_glb()` function builds a large unfiltered point cloud directly from Stream3R predictions.
+This design adds two optional cleanup stages that:
+- **(1) Voxel-reduce** redundant points into weighted centroids.
+- **(2) Filter residual noise** with Open3D’s statistical & radius outlier removal.
+This yields cleaner geometry, lower GLB size, and faster loading, while preserving color and geometry fidelity.
+---
+## 2. Key Additions
+### 2.1 New helpers
+#### `voxel_reduce(points_f32, colors_u8, conf_f32, voxel_size)`
+- Merges points falling within the same voxel grid cell.
+- Weighted average of position, color (in *linear* space), and optional confidence.
+- Returns `(points, colors, support)`.
+#### `o3d_outlier_filter(points_f32, colors_u8, voxel_size, radius_mult, nb_points, nb_neighbors, std_ratio)`
+- Converts NumPy arrays → Open3D `PointCloud`.
+- Applies:
+  1. **Radius Outlier Removal:** ensures local density.
+  2. **Statistical Outlier Removal:** drops sparse noise.
+- Converts filtered result back to NumPy + `uint8` sRGB.
+### 2.2 Optional parameters added to `predictions_to_glb()`
+| Parameter | Type | Default | Description |
+|------------|------|----------|--------------|
+| `voxel_size` | float \| None | None | Enables voxel reduction if >0 |
+| `voxel_after_conf` | bool | True | Reduce after confidence & mask filtering |
+| `o3d_denoise` | bool | True | Enables Open3D outlier filtering |
+| `o3d_params` | dict \| None | None | Override Open3D defaults (e.g. radius_mult) |
+### 2.3 Processing order
+```
+predictions → confidence/bg masking
+→ [optional] voxel_reduce()
+→ [optional] o3d_outlier_filter()
+→ trimesh.PointCloud → GLB
+````
+Both stages are fully optional; default behavior is unchanged.
+---
+## 3. Implementation Plan
+### 3.1 Import dependencies
+```python
+import open3d as o3d     # optional heavy dependency
+import numpy as np
+import trimesh
+````
+### 3.2 Helper: `voxel_reduce()`
+```python
+def voxel_reduce(points_f32, colors_u8, conf_f32=None, voxel_size=0.02, origin=None):
+    # sRGB→linear, weighted average, linear→sRGB
+    # Hash each voxel using large primes (avoids collisions)
+    # Return reduced arrays
+```
+### 3.3 Helper: `o3d_outlier_filter()`
+```python
+def o3d_outlier_filter(points_f32, colors_u8,
+                       voxel_size=0.02,
+                       radius_mult=3.0,
+                       nb_points=16,
+                       nb_neighbors=48,
+                       std_ratio=1.5):
+    # Construct Open3D cloud, remove outliers, return filtered arrays
+```
+### 3.4 Patch in `predictions_to_glb()`
+Insert after confidence masking:
+```python
+if voxel_size is not None and voxel_size > 0:
+    vertices_3d, colors_rgb, _support = voxel_reduce(
+        vertices_3d, colors_rgb, conf_f32=conf_used, voxel_size=float(voxel_size)
+    )
+if o3d_denoise and vertices_3d.size:
+    params = dict(
+        voxel_size=float(voxel_size or 0.02),
+        radius_mult=3.0,
+        nb_points=16,
+        nb_neighbors=48,
+        std_ratio=1.5,
+    )
+    if o3d_params: params.update(o3d_params)
+    vertices_3d, colors_rgb = o3d_outlier_filter(vertices_3d, colors_rgb, **params)
+```
+The rest of the GLB creation (scene scale, camera meshes, alignment) remains unchanged.
+---
+## 4. Default Parameters & Behavior
+| Context            | Setting                                 | Recommended          |
+| ------------------ | --------------------------------------- | -------------------- |
+| Indoor scenes      | `voxel_size=0.02`                       | 2 cm grid            |
+| Fast preview       | `voxel_size=0.06`                       | Coarse 6 cm grid     |
+| Radius filter      | `radius = 3×voxel_size`                 | 0.06 m for 2 cm grid |
+| Statistical filter | `nb_neighbors=48`, `std_ratio=1.5`      | Safe defaults        |
+| Weighting          | Confidence scores (`world_points_conf`) | Use for averages     |
+---
+## 5. Expected Outcomes
+| Metric                 | Before    | After            |
+| ---------------------- | --------- | ---------------- |
+| GLB file size          | 1×        | ↓ 3–8×           |
+| Visual duplicates      | High      | Minimal          |
+| Noise/speckle          | Frequent  | Strongly reduced |
+| Load time (r3f viewer) | Long      | Near-instant     |
+| Fidelity               | Unchanged | Preserved        |
+---
+## 6. Validation Steps
+1. **Run baseline:**
+   `predictions_to_glb(preds, voxel_size=None, o3d_denoise=False)`
+   → export size / load time baseline.
+2. **Run optimized:**
+   `predictions_to_glb(preds, voxel_size=0.02, o3d_denoise=True)`
+   → compare GLB size, visual quality, and FPS in viewer.
+3. **Stress-test:**
+   * High-conf scenes with many frames.
+   * Scenes with thin structures (shelves, walls).
+   * Ensure no noticeable geometric bias or color shift.
+---
+## 7. Future Extensions
+| Feature               | Description                                                        |
+| --------------------- | ------------------------------------------------------------------ |
+| **Normals averaging** | Extend `voxel_reduce()` to merge normals & store as GLB attributes |
+| **Support weighting** | Save per-voxel support count → possible LOD weighting              |
+| **Covariance export** | Optionally compute per-voxel covariance for Gaussian splats        |
+| **Tile-based batch**  | Enable out-of-core fusion for huge rooms                           |
+| **Dual GLB export**   | Auto-save coarse (preview) + fine (full-res) versions              |
+---
+## 8. Example Usage
+```python
+scene = predictions_to_glb(
+    preds,
+    conf_thres=50.0,
+    mask_white_bg=True,
+    voxel_size=0.02,
+    o3d_denoise=True,
+    o3d_params={"nb_neighbors": 64, "std_ratio": 1.3}
+)
+trimesh.exchange.gltf.export_glb(scene, "room_clean.glb")
+```
+---
+## 9. Deliverables for Codex Implementation
+1. **New helper functions**
+   * `voxel_reduce()`
+   * `o3d_outlier_filter()`
+2. **Modified signature** of `predictions_to_glb()` to include new optional args.
+3. **Integration** of both steps before `trimesh.PointCloud`.
+4. **Minimal dependency injection**
+   (`open3d` imported lazily; safe fail if missing).
+5. **Unit test / validation script**
+   * Compare point counts & file sizes pre-/post-cleanup.
+   * Assert geometry type remains `PointCloud`.
+---
+**Outcome:**
+A drop-in replacement for `predictions_to_glb()` producing denser, visually identical but much smaller and cleaner `.glb` point clouds for Stream3R → r3f workflows.
+```
+```

design_docs/voxel_reinflation.md ADDED Viewed

	@@ -0,0 +1,106 @@

+```markdown
+# Design Doc: Voxel Reinflation to Recover Dense Splat Coverage
+**Author:** Brian Clark
+**Last Updated:** 2025-11-07
+**Target Component:** `predictions_to_glb()` (Stream3R repo)
+**Goal:** Restore the “splatty” look of point clouds after voxel reduction by respawning micro-clusters around each reduced voxel center without bloating GLB size back to the original.
+---
+## 1. Overview
+The current voxel reduction keeps one weighted centroid per cell. Great for size, bad for perceived density.
+This design “re-inflates” each voxel into a small jittered cluster proportional to its support, so standard GLB point rendering still looks like a filled surface.
+---
+## 2. Key Concepts
+### 2.1 Inputs from existing pipeline
+- `voxel_reduce()` already returns:
+  - `points`: voxel centroids
+  - `support`: sum of confidences (or point counts) per voxel
+- We can use `support` to estimate how many original samples a voxel represents.
+### 2.2 Re-inflation strategy
+- Emit `k = clamp(round(alpha * support), min_samples, max_samples)` points per voxel.
+- Each point = centroid + jitter sampled uniformly within the voxel cube (or Gaussian with σ tied to voxel size).
+- Jittered colors copied from centroid (optionally add mild hue jitter for variety).
+- Optional normal estimation by reusing local PCA from original neighbors (future extension).
+### 2.3 Controls
+| Parameter | Description | Default |
+| --------- | ----------- | ------- |
+| `reinflate_enabled` | Toggle the entire stage | `False` (opt-in) |
+| `support_scale` | α multiplier converting support → sample count | 0.5 |
+| `min_samples` | Minimum points per voxel | 1 |
+| `max_samples` | Ceiling per voxel to cap explosion | 12 |
+| `jitter_mode` | `cube` (uniform) or `gaussian` | `cube` |
+| `jitter_sigma` | For gaussian mode; fraction of voxel size | 0.35 |
+| `seed` | Deterministic RNG seed for reproducibility | 0 (disabled) |
+---
+## 3. Implementation Plan
+### 3.1 Helper function
+```python
+def reinflate_voxels(points, colors, support, voxel_size, *, config):
+    # Determine sample counts per voxel
+    # Generate jitter offsets (vectorized, deterministic optional)
+    # Repeat colors + points with offsets
+    # Return expanded arrays
+```
+- Use `np.repeat` to build index arrays, avoid Python loops.
+- Optional RNG seeded via `np.random.default_rng`.
+### 3.2 Integration point
+Insert after voxel reduction & support filtering, before Open3D/density filter.
+Why? We want re-inflated points to still be denoised and deduped if needed.
+```
+if reinflate_enabled and vertices_3d.size:
+    vertices_3d, colors_rgb = reinflate_voxels(..., support=conf_used, ...)
+```
+### 3.3 Performance considerations
+- Re-inflation runs in-memory; ensure we pre-allocate arrays (`np.empty`) to avoid Python loops.
+- Keep `max_samples` conservative to prevent 2 GB GLBs.
+- Skip stage when `support` missing (e.g., streaming mode without confidences).
+---
+## 4. Expected Outcomes
+| Metric | Before | After |
+| ------ | ------ | ----- |
+| GLB size | ~0.5× raw | ~0.7× raw (depends on support_scale) |
+| Visual density | Sparse | Near original “splat” look |
+| Compute cost | negligible | + small vectorized jitter step |
+---
+## 5. Validation Plan
+1. Compare point counts vs. original per-scene.
+2. Visual inspection in r3f viewer for surface coverage.
+3. Ensure jitter uses consistent seed for reproducible exports.
+4. Capture metrics: GLB size, avg points per voxel, render FPS.
+---
+## 6. Future Extensions
+- Encode support as per-point radius attribute for richer viewers.
+- Adaptive jitter: anisotropic offsets aligned with local normals.
+- Auto-tune `support_scale` based on desired point budget.
+---
+**Deliverables**
+1. `reinflate_voxels()` helper with tests.
+2. Config plumbing (env vars & `WorkerSettings`).
+3. Integration hooks + logging of pre/post point counts.
+4. Update docs / README to explain new options.
+---
+```

requirements.txt CHANGED Viewed

@@ -42,6 +42,7 @@ seaborn
 pyglet<2
 huggingface-hub[torch]>=0.22
 spaces
 # --------- worker --------- #
 redis

 pyglet<2
 huggingface-hub[torch]>=0.22
 spaces
+open3d
 # --------- worker --------- #
 redis

stream3r/utils/__pycache__/visual_utils.cpython-311.pyc CHANGED Viewed

Binary files a/stream3r/utils/__pycache__/visual_utils.cpython-311.pyc and b/stream3r/utils/__pycache__/visual_utils.cpython-311.pyc differ

stream3r/utils/visual_utils.py CHANGED Viewed

@@ -4,14 +4,213 @@
 # This source code is licensed under the license found in the
 # LICENSE file in the root directory of this source tree.
-import trimesh
-import numpy as np
-import matplotlib
-from scipy.spatial.transform import Rotation
 import copy
-import cv2
 import os
 import requests
 def predictions_to_glb(
@@ -26,6 +225,13 @@ def predictions_to_glb(
     prediction_mode="Predicted Pointmap",
     extra_cameras=None,
     extra_camera_color=(255, 0, 0),
 ) -> trimesh.Scene:
     """
     Converts predictions to a 3D scene represented as a GLB file.
@@ -47,6 +253,13 @@ def predictions_to_glb(
         extra_cameras (Optional[List[np.ndarray]]): Additional camera extrinsics (3x4 or 4x4)
             to visualize even when show_cam=False. Useful for highlighting localized poses.
         extra_camera_color (tuple or list[tuple]): RGB color(s) for extra cameras.
     Returns:
         trimesh.Scene: Processed 3D scene containing point cloud and cameras
@@ -152,7 +365,27 @@ def predictions_to_glb(
         colors_rgb = images
     colors_rgb = (colors_rgb.reshape(-1, 3) * 255).astype(np.uint8)
-    conf = pred_world_points_conf.reshape(-1)
     # Convert percentage threshold to actual confidence value
     if conf_thres == 0.0:
         conf_threshold = 0.0
@@ -173,6 +406,48 @@ def predictions_to_glb(
     vertices_3d = vertices_3d[conf_mask]
     colors_rgb = colors_rgb[conf_mask]
     if vertices_3d is None or np.asarray(vertices_3d).size == 0:
         vertices_3d = np.array([[1, 0, 0]])

 # This source code is licensed under the license found in the
 # LICENSE file in the root directory of this source tree.
+import logging
 import copy
 import os
+import cv2
+import matplotlib
+import numpy as np
 import requests
+import trimesh
+from scipy.spatial import cKDTree
+from scipy.spatial.transform import Rotation
+logger = logging.getLogger(__name__)
+def _srgb_to_linear(colors: np.ndarray) -> np.ndarray:
+    colors = np.clip(colors, 0.0, 1.0)
+    threshold = 0.04045
+    below = colors <= threshold
+    linear = np.empty_like(colors, dtype=np.float64)
+    linear[below] = colors[below] / 12.92
+    linear[~below] = ((colors[~below] + 0.055) / 1.055) ** 2.4
+    return linear
+def _linear_to_srgb(colors: np.ndarray) -> np.ndarray:
+    colors = np.clip(colors, 0.0, 1.0)
+    threshold = 0.0031308
+    srgb = np.empty_like(colors, dtype=np.float64)
+    below = colors <= threshold
+    srgb[below] = colors[below] * 12.92
+    srgb[~below] = 1.055 * np.power(colors[~below], 1 / 2.4) - 0.055
+    return np.clip(np.round(srgb * 255.0), 0, 255).astype(np.uint8)
+def voxel_reduce(
+    points_f32: np.ndarray,
+    colors_u8: np.ndarray,
+    conf_f32: np.ndarray | None = None,
+    voxel_size: float = 0.02,
+    origin: np.ndarray | None = None,
+) -> tuple[np.ndarray, np.ndarray, np.ndarray]:
+    points = np.asarray(points_f32, dtype=np.float32)
+    colors = np.asarray(colors_u8, dtype=np.uint8)
+    if points.size == 0:
+        return (
+            points.reshape(-1, 3).astype(np.float32),
+            colors.reshape(-1, 3).astype(np.uint8),
+            np.zeros((points.shape[0],), dtype=np.float32),
+        )
+    if voxel_size is None or voxel_size <= 0:
+        weights = (
+            np.asarray(conf_f32, dtype=np.float32).reshape(-1)
+            if conf_f32 is not None
+            else np.ones(points.shape[0], dtype=np.float32)
+        )
+        return points.astype(np.float32), colors.astype(np.uint8), weights
+    weights = (
+        np.asarray(conf_f32, dtype=np.float32).reshape(-1)
+        if conf_f32 is not None
+        else np.ones(points.shape[0], dtype=np.float32)
+    )
+    if weights.shape[0] != points.shape[0]:
+        raise ValueError("conf_f32 must match the shape of points.")
+    base = (
+        np.asarray(origin, dtype=np.float32)
+        if origin is not None
+        else points.min(axis=0).astype(np.float32)
+    )
+    voxel_indices = np.floor((points - base) / voxel_size).astype(np.int64)
+    voxel_keys, inverse_indices, counts = np.unique(
+        voxel_indices, axis=0, return_inverse=True, return_counts=True
+    )
+    reduced_count = voxel_keys.shape[0]
+    accum_weights = np.bincount(inverse_indices, weights=weights, minlength=reduced_count)
+    accum_weights = np.where(accum_weights <= 0, 1e-6, accum_weights)
+    reduced_points = np.zeros((reduced_count, 3), dtype=np.float64)
+    for dim in range(3):
+        reduced_points[:, dim] = np.bincount(
+            inverse_indices,
+            weights=weights * points[:, dim],
+            minlength=reduced_count,
+        )
+    reduced_points /= accum_weights[:, None]
+    colors_linear = _srgb_to_linear(colors.astype(np.float32) / 255.0)
+    reduced_colors_linear = np.zeros((reduced_count, 3), dtype=np.float64)
+    for dim in range(3):
+        reduced_colors_linear[:, dim] = np.bincount(
+            inverse_indices,
+            weights=weights * colors_linear[:, dim],
+            minlength=reduced_count,
+        )
+    reduced_colors_linear /= accum_weights[:, None]
+    reduced_colors = _linear_to_srgb(reduced_colors_linear)
+    support = (
+        accum_weights.astype(np.float32)
+        if conf_f32 is not None
+        else counts.astype(np.float32)
+    )
+    return reduced_points.astype(np.float32), reduced_colors.astype(np.uint8), support
+def _filter_by_support(
+    points: np.ndarray,
+    colors: np.ndarray,
+    support: np.ndarray,
+    min_support: float | None,
+) -> tuple[np.ndarray, np.ndarray, np.ndarray]:
+    if (
+        support is None
+        or support.size == 0
+        or min_support is None
+        or min_support <= 0
+    ):
+        return points, colors, support
+    mask = support >= float(min_support)
+    if not np.any(mask):
+        return points, colors, support
+    return points[mask], colors[mask], support[mask]
+def _log_point_count(stage: str, before: int, after: int) -> None:
+    if logger.isEnabledFor(logging.INFO):
+        logger.info("Point cloud %s: %d -> %d", stage, before, after)
+def o3d_outlier_filter(
+    points_f32: np.ndarray,
+    colors_u8: np.ndarray,
+    *,
+    voxel_size: float = 0.02,
+    radius_mult: float = 3.0,
+    nb_points: int = 16,
+    nb_neighbors: int = 48,
+    std_ratio: float = 1.5,
+) -> tuple[np.ndarray, np.ndarray]:
+    points = np.asarray(points_f32, dtype=np.float32)
+    colors = np.asarray(colors_u8, dtype=np.uint8)
+    if points.size == 0:
+        return points.reshape(-1, 3), colors.reshape(-1, 3)
+    try:
+        import open3d as o3d  # type: ignore
+    except ImportError:
+        logger.warning("Open3D not available; skipping outlier filtering.")
+        return points.astype(np.float32), colors.astype(np.uint8)
+    pcd = o3d.geometry.PointCloud()
+    pcd.points = o3d.utility.Vector3dVector(points.astype(np.float64))
+    pcd.colors = o3d.utility.Vector3dVector(colors.astype(np.float32) / 255.0)
+    effective_voxel = float(voxel_size) if voxel_size and voxel_size > 0 else 0.02
+    radius = max(float(radius_mult) * effective_voxel, 1e-4)
+    if nb_points > 0:
+        pcd, _ = pcd.remove_radius_outlier(nb_points=int(nb_points), radius=radius)
+    if len(pcd.points) == 0:
+        return np.empty((0, 3), dtype=np.float32), np.empty((0, 3), dtype=np.uint8)
+    if nb_neighbors > 0:
+        pcd, _ = pcd.remove_statistical_outlier(
+            nb_neighbors=int(nb_neighbors),
+            std_ratio=float(std_ratio),
+        )
+    if len(pcd.points) == 0:
+        return np.empty((0, 3), dtype=np.float32), np.empty((0, 3), dtype=np.uint8)
+    filtered_points = np.asarray(pcd.points, dtype=np.float32)
+    filtered_colors = np.asarray(pcd.colors, dtype=np.float32)
+    filtered_colors = np.clip(np.round(filtered_colors * 255.0), 0, 255).astype(np.uint8)
+    return filtered_points, filtered_colors
+def density_filter_points(
+    points_f32: np.ndarray,
+    colors_u8: np.ndarray,
+    *,
+    radius: float,
+    min_neighbors: int,
+) -> tuple[np.ndarray, np.ndarray]:
+    points = np.asarray(points_f32, dtype=np.float32)
+    colors = np.asarray(colors_u8, dtype=np.uint8)
+    if points.size == 0:
+        return points.reshape(-1, 3), colors.reshape(-1, 3)
+    radius = max(float(radius), 1e-4)
+    min_neighbors = max(int(min_neighbors), 1)
+    tree = cKDTree(points)
+    neighbor_lists = tree.query_ball_point(points, radius)
+    mask = np.fromiter((len(nlist) >= min_neighbors for nlist in neighbor_lists), dtype=bool, count=len(neighbor_lists))
+    return points[mask], colors[mask]
 def predictions_to_glb(
     prediction_mode="Predicted Pointmap",
     extra_cameras=None,
     extra_camera_color=(255, 0, 0),
+    voxel_size: float | None = 0.01,
+    voxel_after_conf: bool = True,
+    min_voxel_support: float | None = 3,
+    o3d_denoise: bool = True,
+    o3d_params: dict | None = {"radius_mult": 3.0, "nb_points": 16, "nb_neighbors": 48, "std_ratio": 1.5},
+    density_filter: bool = True,
+    density_params: dict | None = {"radius": 0.05, "min_neighbors": 6},
 ) -> trimesh.Scene:
     """
     Converts predictions to a 3D scene represented as a GLB file.
         extra_cameras (Optional[List[np.ndarray]]): Additional camera extrinsics (3x4 or 4x4)
             to visualize even when show_cam=False. Useful for highlighting localized poses.
         extra_camera_color (tuple or list[tuple]): RGB color(s) for extra cameras.
+        voxel_size (Optional[float]): Size of voxel grid cells (>0 enables reduction).
+        voxel_after_conf (bool): Apply voxel reduction after confidence/background filtering.
+        min_voxel_support (Optional[float]): Minimum aggregated support (confidence/count) per voxel.
+        o3d_denoise (bool): Enable Open3D outlier filtering.
+        o3d_params (Optional[dict]): Overrides for Open3D filtering parameters.
+        density_filter (bool): Apply KD-tree based density filtering.
+        density_params (Optional[dict]): Overrides for density filter parameters.
     Returns:
         trimesh.Scene: Processed 3D scene containing point cloud and cameras
         colors_rgb = images
     colors_rgb = (colors_rgb.reshape(-1, 3) * 255).astype(np.uint8)
+    conf = pred_world_points_conf.reshape(-1).astype(np.float32)
+    effective_voxel_size = float(voxel_size) if voxel_size is not None else None
+    if effective_voxel_size is not None and effective_voxel_size <= 0:
+        effective_voxel_size = None
+    if effective_voxel_size is not None and not voxel_after_conf:
+        before_count = vertices_3d.shape[0]
+        vertices_3d, colors_rgb, conf = voxel_reduce(
+            vertices_3d,
+            colors_rgb,
+            conf,
+            voxel_size=effective_voxel_size,
+        )
+        vertices_3d, colors_rgb, conf = _filter_by_support(
+            vertices_3d,
+            colors_rgb,
+            conf,
+            min_voxel_support,
+        )
+        _log_point_count("voxel_reduce_pre_conf", before_count, vertices_3d.shape[0])
     # Convert percentage threshold to actual confidence value
     if conf_thres == 0.0:
         conf_threshold = 0.0
     vertices_3d = vertices_3d[conf_mask]
     colors_rgb = colors_rgb[conf_mask]
+    conf_used = conf[conf_mask]
+    if effective_voxel_size is not None and voxel_after_conf and vertices_3d.size:
+        before_count = vertices_3d.shape[0]
+        vertices_3d, colors_rgb, conf_used = voxel_reduce(
+            vertices_3d,
+            colors_rgb,
+            conf_used,
+            voxel_size=effective_voxel_size,
+        )
+        vertices_3d, colors_rgb, conf_used = _filter_by_support(
+            vertices_3d,
+            colors_rgb,
+            conf_used,
+            min_voxel_support,
+        )
+        _log_point_count("voxel_reduce_post_conf", before_count, vertices_3d.shape[0])
+    if o3d_denoise and vertices_3d.size:
+        before_count = vertices_3d.shape[0]
+        params = {
+            "voxel_size": effective_voxel_size or 0.02,
+            "radius_mult": 3.0,
+            "nb_points": 16,
+            "nb_neighbors": 48,
+            "std_ratio": 1.5,
+        }
+        if o3d_params:
+            params.update(o3d_params)
+        vertices_3d, colors_rgb = o3d_outlier_filter(vertices_3d, colors_rgb, **params)
+        _log_point_count("o3d_denoise", before_count, vertices_3d.shape[0])
+    if density_filter and vertices_3d.size:
+        before_count = vertices_3d.shape[0]
+        params = {
+            "radius": (effective_voxel_size or 0.02) * 2.5,
+            "min_neighbors": 5,
+        }
+        if density_params:
+            params.update(density_params)
+        vertices_3d, colors_rgb = density_filter_points(vertices_3d, colors_rgb, **params)
+        _log_point_count("density_filter", before_count, vertices_3d.shape[0])
     if vertices_3d is None or np.asarray(vertices_3d).size == 0:
         vertices_3d = np.array([[1, 0, 0]])

stream3r/worker/config.py CHANGED Viewed

@@ -73,7 +73,7 @@ class WorkerSettings:
     gpu_lock_timeout: int = 3600
     gpu_lock_blocking_timeout: int = 600
-    storage_prefix: str = "scene"
     s3_bucket: str | None = None
     s3_endpoint_url: str | None = None
     s3_region: str | None = None
@@ -114,9 +114,10 @@ class WorkerSettings:
     scene_media_api_base_url: str | None = None
     scene_media_api_token: str | None = None
     scene_media_page_size: int = 200
-    stream_window_size: int = 20
     max_frames_per_job: int = 0
-    default_job_timeout: int = 15 * 60
     @classmethod
     def from_env(cls) -> "WorkerSettings":
@@ -212,6 +213,9 @@ class WorkerSettings:
             "default_job_timeout": _env_int(
                 "STREAM3R_JOB_TIMEOUT", base.default_job_timeout
             ),
         }
         return cls(**kwargs)

     gpu_lock_timeout: int = 3600
     gpu_lock_blocking_timeout: int = 600
+    storage_prefix: str = "scenes"
     s3_bucket: str | None = None
     s3_endpoint_url: str | None = None
     s3_region: str | None = None
     scene_media_api_base_url: str | None = None
     scene_media_api_token: str | None = None
     scene_media_page_size: int = 200
+    stream_window_size: int = 14
     max_frames_per_job: int = 0
+    default_job_timeout: int = 45 * 60
+    upload_session_cache: bool = True
     @classmethod
     def from_env(cls) -> "WorkerSettings":
             "default_job_timeout": _env_int(
                 "STREAM3R_JOB_TIMEOUT", base.default_job_timeout
             ),
+            "upload_session_cache": _env_bool(
+                "STREAM3R_UPLOAD_CACHE", base.upload_session_cache
+            ),
         }
         return cls(**kwargs)

stream3r/worker/tasks.py CHANGED Viewed

@@ -15,6 +15,7 @@ from dataclasses import dataclass, field
 from datetime import datetime, timezone
 from pathlib import Path
 from contextlib import nullcontext
 from typing import Any, Callable, Mapping
 import numpy as np
@@ -471,6 +472,12 @@ def _upload_cache(
 ) -> str | None:
     if cache_path is None or not cache_path.exists():
         return None
     key = runtime.storage.build_key(
         scene_id,
         runtime.settings.models_dir,
@@ -575,7 +582,7 @@ def _save_scene_glb(
         filter_by_frames=payload.get("frame_filter", "All"),
         mask_black_bg=_as_bool(payload.get("mask_black_bg"), False),
         mask_white_bg=_as_bool(payload.get("mask_white_bg"), False),
-        show_cam=_as_bool(payload.get("show_cam"), True),
         mask_sky=_as_bool(payload.get("mask_sky"), False),
         target_dir=str(temp_dir),
         prediction_mode=payload.get("prediction_mode", "Predicted Pointmap"),
@@ -837,6 +844,30 @@ def _execute_job(job_type: str, payload: Mapping[str, Any], handler: JobHandler)
         "scene_id": scene_id,
     }
     runtime.db.upsert_job(
         job_id=job_id,
         job_type=job_type,
@@ -862,6 +893,7 @@ def _execute_job(job_type: str, payload: Mapping[str, Any], handler: JobHandler)
             with tempfile.TemporaryDirectory(prefix=f"stream3r_{job_id}_") as tmp_dir:
                 temp_path = Path(tmp_dir)
                 frame_records = _collect_frames(runtime, scene_id, payload, temp_path)
                 cache_path = temp_path / runtime.settings.session_cache_filename if streaming else None
                 tracker = ProgressTracker(runtime, job_meta)
@@ -874,6 +906,7 @@ def _execute_job(job_type: str, payload: Mapping[str, Any], handler: JobHandler)
                     progress_cb=tracker,
                     window_size=window_size if streaming and mode == "window" else None,
                 )
                 session_settings = _prepare_session_settings(
                     payload,
@@ -895,6 +928,7 @@ def _execute_job(job_type: str, payload: Mapping[str, Any], handler: JobHandler)
                     session_settings=session_settings,
                     temp_dir=temp_path,
                 )
     except Exception as exc:
         error_text = traceback.format_exc()
@@ -914,9 +948,17 @@ def _execute_job(job_type: str, payload: Mapping[str, Any], handler: JobHandler)
                 "error": str(exc),
             },
         )
-        logger.exception("Job %s failed", job_id)
         raise
     runtime.db.upsert_job(
         job_id=job_id,
         job_type=job_type,

 from datetime import datetime, timezone
 from pathlib import Path
 from contextlib import nullcontext
+from time import perf_counter
 from typing import Any, Callable, Mapping
 import numpy as np
 ) -> str | None:
     if cache_path is None or not cache_path.exists():
         return None
+    if not runtime.settings.upload_session_cache:
+        logger.debug(
+            "Skipping session cache upload for scene %s (disabled via settings)",
+            scene_id,
+        )
+        return None
     key = runtime.storage.build_key(
         scene_id,
         runtime.settings.models_dir,
         filter_by_frames=payload.get("frame_filter", "All"),
         mask_black_bg=_as_bool(payload.get("mask_black_bg"), False),
         mask_white_bg=_as_bool(payload.get("mask_white_bg"), False),
+        show_cam=_as_bool(payload.get("show_cam"), False),
         mask_sky=_as_bool(payload.get("mask_sky"), False),
         target_dir=str(temp_dir),
         prediction_mode=payload.get("prediction_mode", "Predicted Pointmap"),
         "scene_id": scene_id,
     }
+    logger.info(
+        "Job %s (%s) started for scene %s (timeout=%s)",
+        job_id,
+        job_type,
+        scene_id,
+        applied_timeout or desired_timeout or "default",
+    )
+    start_time = perf_counter()
+    last_time = start_time
+    def log_progress(stage: str) -> None:
+        nonlocal last_time
+        now = perf_counter()
+        logger.info(
+            "Job %s (%s): %s [delta=%.2fs total=%.2fs]",
+            job_id,
+            job_type,
+            stage,
+            now - last_time,
+            now - start_time,
+        )
+        last_time = now
     runtime.db.upsert_job(
         job_id=job_id,
         job_type=job_type,
             with tempfile.TemporaryDirectory(prefix=f"stream3r_{job_id}_") as tmp_dir:
                 temp_path = Path(tmp_dir)
                 frame_records = _collect_frames(runtime, scene_id, payload, temp_path)
+                log_progress(f"collected frames ({len(frame_records)} items)")
                 cache_path = temp_path / runtime.settings.session_cache_filename if streaming else None
                 tracker = ProgressTracker(runtime, job_meta)
                     progress_cb=tracker,
                     window_size=window_size if streaming and mode == "window" else None,
                 )
+                log_progress(f"inference completed ({inference.total_frames} frames)")
                 session_settings = _prepare_session_settings(
                     payload,
                     session_settings=session_settings,
                     temp_dir=temp_path,
                 )
+                log_progress("artifact generation completed")
     except Exception as exc:
         error_text = traceback.format_exc()
                 "error": str(exc),
             },
         )
+        logger.exception(
+            "Job %s (%s) failed after %.2fs: %s",
+            job_id,
+            job_type,
+            perf_counter() - start_time,
+            exc,
+        )
         raise
+    log_progress("job finished")
     runtime.db.upsert_job(
         job_id=job_id,
         job_type=job_type,

tests/test_voxel_reduction.py ADDED Viewed

	@@ -0,0 +1,114 @@

+import numpy as np
+import pytest
+import trimesh
+from stream3r.utils.visual_utils import (
+    density_filter_points,
+    o3d_outlier_filter,
+    predictions_to_glb,
+    voxel_reduce,
+)
+def test_voxel_reduce_merges_points():
+    points = np.array(
+        [
+            [0.0, 0.0, 0.0],
+            [0.01, 0.0, 0.0],
+            [0.2, 0.0, 0.0],
+        ],
+        dtype=np.float32,
+    )
+    colors = np.array(
+        [
+            [255, 0, 0],
+            [255, 0, 0],
+            [0, 255, 0],
+        ],
+        dtype=np.uint8,
+    )
+    conf = np.array([1.0, 3.0, 1.0], dtype=np.float32)
+    reduced_points, reduced_colors, support = voxel_reduce(points, colors, conf, voxel_size=0.05)
+    assert reduced_points.shape[0] == 2
+    assert reduced_colors.shape[0] == 2
+    assert np.isclose(support.sum(), conf.sum(), atol=1e-3)
+    merged_idx = int(np.argmin(reduced_points[:, 0]))
+    assert pytest.approx(reduced_points[merged_idx, 0], rel=1e-3) == 0.0075
+    assert reduced_colors[merged_idx, 0] >= 250
+    assert reduced_colors[1 - merged_idx, 1] >= 250
+def test_o3d_outlier_filter_removes_outlier():
+    pytest.importorskip("open3d", reason="Open3D not installed")
+    cluster = np.zeros((20, 3), dtype=np.float32)
+    outlier = np.array([[1.0, 1.0, 1.0]], dtype=np.float32)
+    points = np.vstack([cluster, outlier])
+    colors = np.tile(np.array([[128, 128, 255]], dtype=np.uint8), (points.shape[0], 1))
+    filtered_points, _ = o3d_outlier_filter(
+        points,
+        colors,
+        voxel_size=0.05,
+        radius_mult=2.5,
+        nb_points=4,
+        nb_neighbors=8,
+        std_ratio=1.0,
+    )
+    assert filtered_points.shape[0] < points.shape[0]
+def test_predictions_to_glb_voxel_pipeline():
+    world_points = np.array(
+        [
+            [
+                [[0.0, 0.0, 0.0], [0.02, 0.0, 0.0]],
+                [[0.0, 0.02, 0.0], [0.02, 0.02, 0.0]],
+            ]
+        ],
+        dtype=np.float32,
+    )
+    predictions = {
+        "world_points": world_points,
+        "world_points_conf": np.ones((1, 2, 2), dtype=np.float32),
+        "world_points_from_depth": world_points,
+        "depth_conf": np.ones((1, 2, 2), dtype=np.float32),
+        "images": np.ones((1, 2, 2, 3), dtype=np.float32) * 0.5,
+        "extrinsic": np.array(
+            [
+                [
+                    [1.0, 0.0, 0.0, 0.0],
+                    [0.0, 1.0, 0.0, 0.0],
+                    [0.0, 0.0, 1.0, 0.0],
+                ]
+            ],
+            dtype=np.float32,
+        ),
+    }
+    scene = predictions_to_glb(
+        predictions,
+        conf_thres=0.0,
+        voxel_size=0.05,
+        o3d_denoise=False,
+        density_filter=False,
+    )
+    assert isinstance(scene, trimesh.Scene)
+    assert len(scene.geometry) >= 1
+def test_density_filter_points_removes_isolated_samples():
+    rng = np.random.default_rng(0)
+    cluster = rng.normal(scale=0.005, size=(50, 3)).astype(np.float32)
+    outliers = np.array([[0.4, 0.4, 0.4], [0.6, 0.6, 0.6]], dtype=np.float32)
+    points = np.vstack([cluster, outliers])
+    colors = np.tile(np.array([[200, 200, 200]], dtype=np.uint8), (points.shape[0], 1))
+    filtered_points, _ = density_filter_points(points, colors, radius=0.05, min_neighbors=5)
+    assert filtered_points.shape[0] < points.shape[0]
+    assert np.all(filtered_points.max(axis=0) < 0.2)