Spaces:

dwellbot
/

dwellbot_stream3r

Configuration error

App Files Files Community

brian4dwell commited on Nov 5, 2025

Commit

01e8928

1 Parent(s): de1bede

split key framer out

Browse files

Files changed (12) hide show

design_docs/keyframe_service.md +178 -0
docs/api_usage.md +598 -0
notes.md +2 -0
stream3r/utils/__pycache__/visual_utils.cpython-311.pyc +0 -0
stream3r/utils/visual_utils.py +16 -0
stream3r/worker/__init__.py +2 -1
stream3r/worker/config.py +23 -1
stream3r/worker/keyframes.py +408 -0
stream3r/worker/main.py +1 -1
stream3r/worker/tasks.py +233 -33
tests/test_voxel_reduction.py +44 -0
worker/stream3r/jobs.py +3 -3

design_docs/keyframe_service.md ADDED Viewed

	@@ -0,0 +1,178 @@

+```markdown
+# Keyframe Selection Service
+**Author:** Brian Clark
+**Last Updated:** 2025-11-07
+**Audience:** Integrators needing motion-aware keyframes instead of linear sampling
+---
+## Overview
+The keyframe selection worker ingests a raw video (Backblaze/S3 key), extracts frames, runs motion + coverage analysis, and uploads the chosen keyframes back to the media store while recording them in the `scene_media` table. Unlike linear FPS sampling, it keeps only the most informative views (~10–20 frames for a typical 30 s scan).
+The workflow is exposed as an RQ job (`keyframe_selection`). Another service (e.g., scene-graph-manager) can enqueue this job instead of doing its own downsampling.
+---
+## Job Inputs
+| Field | Required | Description |
+|-------|----------|-------------|
+| `scene_id` | ✅ | Scene identifier used in `scene_media` |
+| `video_key` | ✅ | Storage key (Backblaze/S3) for the source video |
+| `top_k` | optional | Desired maximum keyframes; defaults to `STREAM3R_KEYFRAME_TOP_K` |
+| `extract_fps` | optional | Override extraction FPS (default `STREAM3R_KEYFRAME_EXTRACT_FPS`) |
+| `extract_max_frames` | optional | Cap on total decoded frames (default `STREAM3R_KEYFRAME_EXTRACT_MAX_FRAMES`) |
+| Optional filters | e.g., `ceiling_percentile`, `ceiling_z_max` for downstream GLB |
+Example payload:
+```json
+{
+  "job_type": "keyframe_selection",
+  "scene_id": "scene-123",
+  "video_key": "scene-data/videos/scene-123.mp4",
+  "top_k": 16,
+  "extract_fps": 6.0
+}
+```
+Enqueue via RQ:
+```python
+from rq import Queue
+from redis import Redis
+queue = Queue("keyframe_selection", connection=Redis.from_url("redis://"))
+queue.enqueue("worker.stream3r.jobs.handle_job", {
+    "job_type": "keyframe_selection",
+    "scene_id": "scene-123",
+    "video_key": "scene-data/videos/scene-123.mp4"
+})
+```
+---
+## Processing Pipeline
+1. **Download & Extract Frames**
+   - Video pulled via `runtime.storage.download_to_path`.
+   - Frames decoded with OpenCV at `extract_fps` (default 6 fps) up to `extract_max_frames`.
+2. **Motion + Coverage Pre-pass**
+   - Lightweight Stream3R windowed inference run to collect poses (`extrinsic`) and confidences.
+   - Motion scoring: translation + weighted rotation deltas, greedily ensures pose diversity.
+   - Coverage scoring: counts high-confidence voxel IDs contributed per frame, greedily maximizes new coverage.
+   - Diagnostics stored per frame (reason, motion delta, coverage gain, confidence).
+   - If inference fails, fallback to linear sampling.
+3. **Selection & Upload**
+   - Selected frames copied; images uploaded via `runtime.storage.upload_file` under `keyframe_upload_dir` (default `keyframes`).
+   - Scene media rows inserted through `record_scene_media_entries` API.
+   - Manifest (`selected_frames`) returned with storage keys and diagnostics.
+4. **Result**
+   - Job metadata includes: native video FPS, total extracted frames, selected frame details, and diagnostics.
+   - `selected_frames.json` (optional) can be stored by downstream jobs for auditing.
+   - Scene-media registration is attempted via the configured API; if the endpoint does not accept POST (e.g., legacy deployments), the worker logs and skips registration without failing the job.
+---
+## Outputs & Diagnostics
+Result payload (simplified):
+```json
+{
+  "job_id": "...",
+  "scene_id": "scene-123",
+  "video_key": "scene-data/videos/scene-123.mp4",
+  "native_fps": 29.97,
+  "total_frames": 420,
+  "selected_frames": [
+    {
+      "frame_id": "frame_000012",
+      "frame_index": 12,
+      "url": "s3://bucket/scenes/scene-123/keyframes/frame_000012.jpg",
+      "storage_key": "scenes/scene-123/keyframes/frame_000012.jpg",
+      "diagnostics": {
+        "reason": "motion",
+        "motion_delta": 0.42,
+        "coverage_gain_ratio": 0.08,
+        "mean_confidence": 0.67
+      }
+    }
+  ]
+}
+```
+Each uploaded frame is also inserted/updated in the `scene_media` table via the Scene Graph API (`/scenes/{scene_id}/media`).
+---
+## Configuration
+Environment variables for fine-tuning:
+| Env Var | Default | Notes |
+|---------|---------|-------|
+| `STREAM3R_QUEUE_KEYFRAME` | `keyframe_selection` | RQ queue name |
+| `STREAM3R_KEYFRAME_EXTRACT_FPS` | `6.0` | Extraction FPS |
+| `STREAM3R_KEYFRAME_EXTRACT_MAX_FRAMES` | `1200` | Extraction cap |
+| `STREAM3R_KEYFRAME_UPLOAD_DIR` | `keyframes` | Storage subdirectory |
+| `STREAM3R_KEYFRAME_TOP_K` | `16` | Default selection budget |
+| `STREAM3R_KEYFRAME_PREPASS` | `1` | Enable motion/coverage inference |
+| `STREAM3R_KEYFRAME_MOTION_THRESH` | `0.4` | Motion threshold |
+| `STREAM3R_KEYFRAME_ROT_WEIGHT` | `0.5` | Rotation weight |
+| `STREAM3R_KEYFRAME_MIN_GAIN` | `0.01` | Min coverage gain |
+| `STREAM3R_KEYFRAME_FULL_MAX_FRAMES` | `24` | Switch to full attention when below |
+Scene media API requirements:
+- `STREAM3R_MEDIA_API_BASE_URL`
+- `STREAM3R_MEDIA_API_TOKEN` for authenticated inserts
+Ceiling trimming (optional) can be set per job via `ceiling_percentile`, `ceiling_margin`, or `ceiling_z_max` so downstream GLBs remain clean.
+---
+## Integration Steps for External Services
+1. **Deploy Worker Queue**
+   - Run the Stream3R worker with `--queue keyframe_selection` (already default when env var set).
+   - Ensure GPU not required: pre-pass uses Stream3R; CPU-only environments should set `STREAM3R_MODEL_DEVICE=cpu` or schedule on GPU hosts.
+2. **Enqueue Jobs**
+   - Replace existing linear sampling code with an RQ enqueue call.
+   - Store job IDs if you need to poll job status or consume events.
+3. **Consume Results**
+   - After completion, list `scene_media` for `media_type=image` to retrieve new keyframe entries.
+   - Inspect returned diagnostics for debugging or to render navigation overlays.
+4. **Fallback Handling**
+   - If the job fails, the queue returns error details; you can revert to your existing sampler.
+   - Consider scheduling a retry with adjusted parameters (e.g., lower `top_k`).
+---
+## Benefits vs. Linear Sampling
+- **Fewer redundant frames**: motion-aware spacing ensures pose diversity.
+- **Better geometry coverage**: only keeps frames that add new high-confidence voxels.
+- **Consistent diagnostics**: each selected frame includes reasons and confidence, aiding QA.
+- **Automatic uploads**: frames stored in Backblaze/local storage with `scene_media` entries ready for viewers.
+---
+## Future Enhancements
+- Optional semantic filtering to avoid ceilings/walls.
+- Exposure of thumbnails or depth maps alongside keyframes.
+- Batch selection across multiple videos.
+---
+For questions or integration support, contact the Stream3R team or refer to `stream3r/worker/keyframes.py` for implementation details.
+```

docs/api_usage.md ADDED Viewed

	@@ -0,0 +1,598 @@

+# Scene Graph Manager API Usage
+This document gives a concise, high-signal overview of the HTTP API so other services can integrate without reading the whole codebase.
+## Refresh Checklist (for future updates)
+Use this prompt when endpoints change:
+> Run `rg "@app" -n api/app.py` and list new/modified routes. Update `docs/api_usage.md` with:
+> - Endpoint table (method, path, summary)
+> - Auth / header requirements
+> - Request & response JSON samples (curl when useful)
+> - Notes on query params, error behaviors, and streaming endpoints.
+> Re-run `python -m compileall api/app.py` and ensure the doc still reflects reality.
+## Base URL
+All paths below are relative to:
+```
+https://scene-graph-mgr-api.fly.dev
+```
+## Authentication
+- Public endpoints are unauthenticated but subject to Fly.io rate limits.
+- Internal write endpoints require a shared secret header: `x-internal-secret: <secret>`.
+- Standard error payload:
+```json
+{
+  "detail": "Human readable message",
+  "error_code": "optional-machine-code"
+}
+```
+4xx indicates caller issues (validation, missing scene). 5xx means server-side failure.
+## Endpoint Map
+| Method | Path | Summary |
+| --- | --- | --- |
+| GET | `/healthz` | Liveness check. |
+| GET | `/scenes` | List scenes with summary metadata (optional `include_empty`). |
+| PUT | `/scenes/{scene_id}` | Overwrite a scene with a full graph payload. |
+| PATCH | `/scenes/{scene_id}` | Apply an RFC 6902 patch to the latest graph. |
+| POST | `/scenes/{scene_id}/create` | Seed an empty scene (optionally overwrite). |
+| GET | `/scenes/{scene_id}/versions/latest` | Latest graph version. |
+| GET | `/scenes/{scene_id}/versions` | Metadata for all versions. |
+| GET | `/scenes/{scene_id}/versions/{version_id}` | Raw version record (JSON). |
+| POST | `/scenes/{scene_id}/add_image` | Upload image bytes and enqueue processing. |
+| POST | `/scenes/{scene_id}/add_images_from_keys` | Enqueue processing for existing S3 keys. |
+| POST | `/scenes/{scene_id}/upload_video` | Upload video and queue frame extraction batch. |
+| POST | `/scenes/{scene_id}/upload_image` | Browser upload → WebP resize → presigned URL. |
+| GET | `/scenes/{scene_id}/instances` | List stored instances for the scene (status filters optional). |
+| GET | `/scenes/{scene_id}/objects/{obj_id}/beliefs` | Latest improver belief for an object. |
+| GET | `/scenes/{scene_id}/objects/{obj_id}/instances` | List stored instances for an object (status filters optional). |
+| GET | `/scenes/{scene_id}/improver/backlog` | Inspect improver queue/backlog for a scene. |
+| POST | `/scenes/{scene_id}/objects/{obj_id}/instances/{instance_id}/improve_instance` | Queue improver workflow for an instance. |
+| GET | `/scenes/{scene_id}/runlogs` | Recent improver run logs (filterable). |
+| GET | `/scenes/{scene_id}/change-requests` | Inspect change request queue (pending/applied). |
+| GET | `/scenes/{scene_id}/change-summaries` | Stream applied change summaries (newest first). |
+| GET | `/scenes/{scene_id}/diff` | Structural diff between two versions (machine patch & stats). |
+| GET | `/scenes/{scene_id}/diff-semantic` | Semantic diff (ID-aware ops). |
+| GET | `/scenes/{scene_id}/images/presign` | Generate presigned GET URL for an image key. |
+| GET | `/scenes/{scene_id}/media` | List recorded media assets (images/videos). |
+| POST | `/scenes/{scene_id}/media` | Upsert scene media entries supplied by workers or services. |
+| GET | `/jobs/{job_id}` | Inspect queued/finished RQ jobs. |
+| POST | `/stream3r/jobs` | Submit a reconstruction job (pose pointmap or model build). |
+| GET | `/stream3r/jobs/{job_id}` | Inspect a Stream3R job record. |
+| GET | `/stream3r/jobs` | List Stream3R jobs (filter by scene, type, status). |
+| GET | `/stream3r/jobs/{job_id}/events` | Server-Sent Events feed for job lifecycle updates. |
+| GET | `/stream3r/models/{scene_id}/presign` | Presign the latest model_build scene.glb for download. |
+| POST | `/internal/commit-version` | Worker/internal scene commit (requires secret). |
+| GET | `/debug/s3-ping` | Smoke-test S3/B2 credentials. |
+| WS | `/ws?channel=all|{scene_id}` | Broadcasts scene events over WebSocket. |
+---
+## Endpoint Details & Examples
+### Health & Diagnostics
+#### `GET /healthz`
+Returns `{ "ok": true, "ts": "ISO timestamp" }` for uptime monitoring.
+#### `GET /debug/s3-ping`
+Verifies object storage connectivity using a put/delete round-trip. Good for smoke tests.
+### Scene Lifecycle
+#### `GET /scenes`
+List scenes. Optional `include_empty=true` includes scenes with no versions yet.
+```bash
+curl "https://scene-graph-mgr-api.fly.dev/scenes?include_empty=true"
+```
+Response contains summaries with counts and most recent version metadata.
+#### `PUT /scenes/{scene_id}`
+Replace entire scene graph.
+Request body (`SceneGraphEnvelope`):
+```json
+{
+  "scene_location_id": "scene-123",
+  "scene_graph": {"objects": [], "relations": []},
+  "meta": {"source": "manual"}
+}
+```
+Creates a new version record and broadcasts `scene.put` on WebSocket.
+#### `PATCH /scenes/{scene_id}`
+Apply RFC 6902 patch to latest graph. Optional `base_version` guard prevents lost updates.
+```json
+{
+  "scene_location_id": "scene-123",
+  "json_patch": [
+    {"op": "add", "path": "/objects/-", "value": {"id": "chair-1", "attributes": {}}}
+  ],
+  "base_version": 1728400000000
+}
+```
+#### `POST /scenes/{scene_id}/create`
+Seeds an empty graph. `overwrite=true` allows reseeding existing scenes.
+#### Version Reads
+- `GET /scenes/{scene_id}/versions/latest` → latest full graph.
+- `GET /scenes/{scene_id}/versions` → list of `{version_id, created_at, bytes}`.
+- `GET /scenes/{scene_id}/versions/{version_id}` → raw record (graph + metadata).
+### Image Upload & Processing
+#### `POST /scenes/{scene_id}/add_image`
+Multipart upload (`file=@image.jpg`). Stores bytes via S3-compatible API, seeds scene if empty, enqueues `worker.tasks.process_image_for_scene`. Response includes RQ `job_id` and filename.
+#### `POST /scenes/{scene_id}/add_images_from_keys`
+JSON body:
+```json
+{
+  "keys": ["scenes/scene-123/images/20241008/a.png"],
+  "room_hint": "living_room",
+  "prompt": "describe objects",
+  "bounding_boxes": {"a.png": [[0.1,0.2,0.5,0.8]]}
+}
+```
+Seeds scene if needed and enqueues batch worker job.
+Response (`AddImagesFromKeysResponse`):
+```json
+{
+  "scene_location_id": "scene-123",
+  "queued_at": "2024-10-08T14:12:05Z",
+  "job_id": "rq-job-id",
+  "keys": [
+    "scenes/scene-123/images/20241008/a.png",
+    "scenes/scene-123/images/20241008/b.png"
+  ]
+}
+```
+#### `POST /scenes/{scene_id}/upload_image`
+For browser uploads. Accepts multipart file, resizes to WebP (max width 1024), uploads, and returns:
+```json
+{
+  "scene_location_id": "scene-123",
+  "key": "scenes/scene-123/images/20241008/abc.webp",
+  "url": "https://...presigned...",
+  "width": 768,
+  "height": 512,
+  "bytes": 123456
+}
+```
+#### `GET /scenes/{scene_id}/images/presign`
+Query parameters: `key` (must reside under `scenes/{scene_id}/images/…` **or** `scenes/{scene_id}/videos/…`) and optional `expires`. Returns `{ "url": "...", "expires_in": 900 }`.
+### Video Upload & Frame Extraction
+#### `POST /scenes/{scene_id}/upload_video`
+Accepts a multipart video upload (e.g., `file=@walkthrough.mp4`). Stores the binary in object storage and seeds an empty scene when needed. The worker enqueues `process_video_for_scene`, which:
+- extracts WebP frames at 2fps by default (`frame_interval=0.5` seconds) and stores them under `scenes/{scene_id}/images/video_frames/...` (or keeps the source under `scenes/{scene_id}/videos/...`) so the `/images/presign` endpoint can serve them;
+- retries with a software H.264 transcode if AV1 or other codecs fail to decode on the host;
+- publishes the same `scene.update` event stream as still-image uploads.
+Response payload (`UploadVideoResponse`):
+```json
+{
+  "scene_location_id": "scene-123",
+  "key": "scenes/scene-123/videos/20241008/abcd1234.mp4",
+  "filename": "walkthrough.mp4",
+  "size_bytes": 456789012,
+  "content_type": "video/mp4",
+  "queued_at": "2024-10-08T14:12:05Z",
+  "job_id": "rq-job-id"
+}
+```
+Clients should poll `GET /jobs/{job_id}` to track progress. When `job.result.frame_keys` is present the frame extraction succeeded.
+### Scene Media
+#### `GET /scenes/{scene_id}/media`
+Lists stored media records for a scene. Use `media_type=image` to filter to keyframes vs. `media_type=video` for source clips.
+```bash
+curl "https://scene-graph-mgr-api.fly.dev/scenes/scene-123/media?media_type=image&limit=50"
+```
+Response (`SceneMediaListResponse`) includes the total slice returned plus ISO timestamps normalized to UTC.
+#### `POST /scenes/{scene_id}/media`
+Upserts media entries (usually called by workers after uploading files). Payload:
+```json
+{
+  "entries": [
+    {
+      "file": "scenes/scene-123/keyframes/frame_000012.jpg",
+      "media_type": "image",
+      "captured_at": "2024-10-08T14:12:05Z"
+    },
+    {
+      "file": "scenes/scene-123/videos/20241008/abcd1234.mp4",
+      "media_type": "video"
+    }
+  ]
+}
+```
+- `media_type` is optional—when omitted the server infers it from the filename (`image` vs `video`).
+- `captured_at` accepts ISO-8601 strings (UTC preferred). If omitted, the server stores `now()`.
+- Existing rows are updated in-place thanks to an upsert on the `file` column.
+Response (`SceneMediaBatchResponse`) summarizes how many entries were accepted:
+```json
+{
+  "scene_id": "scene-123",
+  "accepted": 2,
+  "skipped": 0,
+  "files": [
+    "scenes/scene-123/keyframes/frame_000012.jpg",
+    "scenes/scene-123/videos/20241008/abcd1234.mp4"
+  ]
+}
+```
+### Improver Monitoring
+#### `GET /scenes/{scene_id}/improver/backlog`
+Summarizes outstanding improver work for a scene. Default statuses are `pending`, `queued`, and `processing`, plus records that do not yet have a status. Use this to detect when the improver has drained the backlog.
+Query parameters:
+- `limit` (default `200`, max `2000`) — cap the number of instances returned.
+- `status` — optional list to override which statuses count as “not yet processed.” Omit to use the defaults.
+- `include_missing_status` (default `true`) — include rows with `status IS NULL`.
+Response:
+```json
+{
+  "scene_id": "scene-123",
+  "count": 2,
+  "instances": [
+    {
+      "scene_id": "scene-123",
+      "obj_id": "obj-1",
+      "instance_id": "inst-42",
+      "status": "pending",
+      "status_reason": "ambient",
+      "status_changed_at": "2024-10-08T14:15:06Z",
+      "last_event_at": "2024-10-08T14:15:06Z",
+      "created_at": "2024-10-08T13:59:40Z",
+      "data": {
+        "image_id": "img-998",
+        "bbox_xyxy": [0.1, 0.2, 0.5, 0.8]
+      }
+    }
+  ]
+}
+```
+When `count` is zero the improver has no pending work for that scene.
+### Improver & Beliefs
+#### `POST /scenes/{scene_id}/objects/{obj_id}/instances/{instance_id}/improve_instance`
+Queues improver workflow. Body adheres to `InstanceEvent` schema. Example:
+```bash
+curl -X POST \
+  -H "Content-Type: application/json" \
+  -d '{
+        "scene_id": "scene-123",
+        "obj_id": "sofa-1",
+        "instance_id": "inst-456",
+        "image_id": "scenes/scene-123/images/20241008/sofa.png",
+        "bbox_xyxy": [0.1, 0.3, 0.6, 0.9]
+      }' \
+  https://scene-graph-mgr-api.fly.dev/scenes/scene-123/objects/sofa-1/instances/inst-456/improve_instance
+```
+Response: `{ "enqueued": true, "job_id": "rq-job-id" }`.
+The worker embeds, seeds Qdrant, and calls `scene_improver.tasks.improve_scene`.
+#### `GET /scenes/{scene_id}/objects/{obj_id}/beliefs`
+Returns latest belief payload (name Dirichlet, attribute betas, relations) or 404 if none stored.
+#### `GET /scenes/{scene_id}/instances`
+Returns persisted instances across every object in the scene. Query params:
+- `limit` (1–5000, default 1000)
+- `status` (optional, repeatable) — only include instances whose `status` matches one of the provided values.
+- `exclude_status` (optional, repeatable) — omit instances whose `status` matches any of the provided values (e.g., `exclude_status=superseded` to skip reassigned instances).
+Response mirrors the storage map:
+```json
+{
+  "scene_id": "scene-123",
+  "count": 5,
+  "objects": {
+    "sofa-1": [{...}, {...}],
+    "lamp-4": [{...}]
+  }
+}
+```
+#### `GET /scenes/{scene_id}/objects/{obj_id}/instances`
+Returns the persisted instance rows for the object. Query params:
+- `limit` (1–1000, default 100)
+- `status` (optional, repeatable) — only include instances whose `status` matches one of the provided values.
+- `exclude_status` (optional, repeatable) — omit instances whose `status` matches any of the provided values.
+Example request skipping superseded items:
+```bash
+curl "https://scene-graph-mgr-api.fly.dev/scenes/scene-123/objects/sofa-1/instances?limit=20&exclude_status=superseded"
+```
+Response:
+```json
+{
+  "scene_id": "scene-123",
+  "obj_id": "sofa-1",
+  "count": 2,
+  "instances": [
+    {
+      "id": "inst-456",
+      "image_id": "scenes/...",
+      "bbox_xyxy": [0.1,0.3,0.6,0.9],
+      "captured_at": 1728401000,
+      "status": "processed"
+    },
+    {
+      "id": "inst-123",
+      "image_id": "scenes/...",
+      "bbox_xyxy": [0.05,0.2,0.4,0.7],
+      "status": "pending"
+    }
+  ]
+}
+```
+#### `GET /scenes/{scene_id}/runlogs`
+Query params:
+- `limit` (1–1000, default 100)
+- `obj_id` (optional filter)
+- `instance_id` (optional filter)
+Response contains `records` newest-first with `step`, `message`, `data`, timestamps, and run IDs.
+#### `GET /scenes/{scene_id}/change-requests`
+Inspect the change request queue. Query params:
+- `state` (optional: `pending`, `applied`, `stale`)
+- `limit` (1–200, default 50)
+Returns an array mirroring the DB record with preconditions, payload, result, and the new `applied_summary` field:
+```json
+[
+  {
+    "request_id": "95e8...",
+    "scene_id": "scene-123",
+    "obj_id": "table-1",
+    "requested_by": "belief-agent",
+    "state": "applied",
+    "confidence": 0.92,
+    "payload": {"operations": [...]},
+    "result": {"summary": "Renamed object \"wooden table\" from \"table\" to \"dining table\""},
+    "applied_at": "2024-11-19T20:14:32.123Z",
+    "applied_summary": "Renamed object \"wooden table\" from \"table\" to \"dining table\""
+  }
+]
+```
+#### `GET /scenes/{scene_id}/change-summaries`
+Lightweight feed of applied change blurbs, ordered by `applied_at` descending. Supports `limit` (1–200, default 50).
+```bash
+curl "https://scene-graph-mgr-api.fly.dev/scenes/scene-123/change-summaries?limit=20"
+```
+```json
+[
+  {
+    "request_id": "95e8...",
+    "scene_id": "scene-123",
+    "obj_id": "table-1",
+    "applied_version": 1729300042000,
+    "applied_at": "2024-11-19T20:14:32.123Z",
+    "summary": "Updated object \"wooden table\" attribute \"size\" from \"medium\" to \"large\""
+  },
+  {
+    "request_id": "73f1...",
+    "scene_id": "scene-123",
+    "summary": "Renamed object \"wooden table\" from \"table\" to \"dining table\""
+  }
+]
+```
+Use this endpoint for ambient “stream of consciousness” UIs showing how the improver evolves the scene.
+### Diffs
+#### `GET /scenes/{scene_id}/diff`
+Compare two versions (`from_version`, `to_version`). Optional `mode` query (`patch`, `summary`, `both`). Returns machine patch plus stats when requested.
+#### `GET /scenes/{scene_id}/diff-semantic`
+ID-aware diff returning ordered operations (append/remove/replace) with summary stats.
+### Stream3R Reconstruction Jobs
+The Stream3R API wraps the reconstruction workers (pose pointmaps and full models) and provides idempotent enqueue, polling, and event streaming.
+#### `POST /stream3r/jobs`
+Submit a reconstruction job. Supported `job_type` values are `pose_pointmap` and `model_build`. Provide at least one frame (`url` or `path`) and optional `client_request_id` for idempotency (subsequent calls reuse the existing job and return `200 OK`).
+```bash
+curl -X POST https://scene-graph-mgr-api.fly.dev/stream3r/jobs \
+  -H "Content-Type: application/json" \
+  -d '{
+        "job_type": "pose_pointmap",
+        "scene_id": "scene-123",
+        "frames": [
+          {"url": "https://cdn.example/scene-123/frame_0000.webp"},
+          {"path": "/mnt/captures/scene-123/frame_0001.png"}
+        ],
+        "session_settings": {"prediction_mode": "pointmap"},
+        "client_request_id": "scene-123-20241008-run1"
+      }'
+```
+Response (`202 Accepted` on first submission):
+```json
+{
+  "job_id": "d8f8a3fc-3aed-441c-ac78-2b953a9229bf",
+  "job_type": "pose_pointmap",
+  "scene_id": "scene-123",
+  "status": "queued",
+  "created_at": "2024-10-08T14:12:05.417Z",
+  "payload": {
+    "job_type": "pose_pointmap",
+    "scene_id": "scene-123",
+    "frames": [
+      {"url": "https://cdn.example/scene-123/frame_0000.webp"},
+      {"path": "/mnt/captures/scene-123/frame_0001.png"}
+    ],
+    "session_settings": {"prediction_mode": "pointmap"}
+  }
+}
+```
+#### `GET /stream3r/jobs/{job_id}`
+Fetch the canonical job record (backed by Postgres). Fields include:
+- `status`: `queued`, `started`, `progress`, `finished`, or `failed`
+- `result`: worker-published artifact manifest (S3 URLs, local paths)
+- `error`: error string when status is `failed`
+- timestamps (`created_at`, `started_at`, `completed_at`)
+Typical successful response:
+```json
+{
+  "job_id": "d8f8a3fc-3aed-441c-ac78-2b953a9229bf",
+  "job_type": "model_build",
+  "scene_id": "scene-123",
+  "status": "finished",
+  "created_at": "2024-10-08T14:12:05.417Z",
+  "started_at": "2024-10-08T14:12:12.998Z",
+  "completed_at": "2024-10-08T14:24:39.221Z",
+  "result": {
+    "model_dir": "s3://bucket/scene-123/stream3r/models/20241008",
+    "summary_url": "s3://bucket/scene-123/stream3r/models/20241008/summary.json"
+  },
+  "error": null,
+  "client_request_id": "scene-123-20241008-run1"
+}
+```
+#### `GET /stream3r/jobs`
+List jobs with optional filters:
+- `scene_id`
+- `job_type`
+- `status`
+- `limit` (1–200, default 50)
+- `offset` (default 0)
+Returns `{ "jobs": [...], "limit": 50, "offset": 0 }` with the same schema as the single-job response.
+#### `GET /stream3r/jobs/{job_id}/events`
+Server-Sent Events feed backed by Redis Streams. Use it for near-real-time updates in browser dashboards.
+```bash
+curl --no-buffer \
+  -H "Accept: text/event-stream" \
+  "https://scene-graph-mgr-api.fly.dev/stream3r/jobs/d8f8a3fc-3aed-441c-ac78-2b953a9229bf/events"
+```
+Events are emitted as standard SSE payloads:
+```
+id: 1728409930123-0
+event: progress
+data: {"job_id":"d8f8a3fc-3aed-441c-ac78-2b953a9229bf","status":"progress","progress":65}
+id: 1728409960456-0
+event: finished
+data: {"job_id":"d8f8a3fc-3aed-441c-ac78-2b953a9229bf","status":"finished","result_url":"s3://bucket/.../summary.json"}
+```
+Reconnect with the last `id` to resume (`?last_id=<redis-stream-id>`). When the worker encounters an error the stream emits `event: failed` with an `error` field.
+> **Implementation note:** the current worker stub still returns `status="failed"` with `error="stream3r worker not implemented"` until the GPU-backed handler ships. Downstream clients should surface the error text to operators and may retry later.
+#### `GET /stream3r/models/{scene_id}/presign`
+Fetch a presigned download URL for the most recent `model_build` job's `scene.glb`. Optional query params:
+- `job_id` — force a specific job (must belong to the scene).
+- `expires` — TTL in seconds (default 900, range 60–86400).
+```bash
+curl "https://scene-graph-mgr-api.fly.dev/stream3r/models/scene-123/presign?expires=600"
+```
+Response:
+```json
+{
+  "scene_id": "scene-123",
+  "job_id": "d8f8a3fc-3aed-441c-ac78-2b953a9229bf",
+  "key": "scenes/scene-123/stream3r/models/20241008/scene.glb",
+  "url": "https://s3.amazonaws.com/...signature...",
+  "expires_in": 600
+}
+```
+Returns `404` if no successful model build exists or the job did not publish a GLB artifact.
+### Jobs & Internal Ops
+#### `GET /jobs/{job_id}`
+Inspect RQ job status. Returns timestamps, result payload (if finished), and truncated stack trace when failed.
+#### `POST /internal/commit-version`
+Worker-only commit. Body:
+```json
+{
+  "scene_location_id": "scene-123",
+  "scene_graph": {...},
+  "base_version": 1728400000000,
+  "meta": {"source": "worker"}
+}
+```
+Requires `x-internal-secret` header when enabled. Creates new version, broadcasts `scene.update`, and publishes on Redis pub/sub.
+### WebSocket Stream
+#### `WS /ws?channel=all|{scene_id}`
+Receives JSON events when scenes change (`scene.create`, `scene.put`, `scene.patch`, `scene.update`). Use to refresh UI state in real time.
+---
+## Notes & Best Practices
+- All timestamps are UTC ISO strings.
+- Scene writes (`PUT`, `PATCH`, `create`, `commit-version`) broadcast on WebSocket and publish to Redis channel `scene_events`.
+- Object storage paths follow `scenes/{scene_id}/images/...`; presign endpoint enforces this prefix.
+- Scene graph payloads no longer embed instance blobs; use the `/instances` endpoint for that data.
+- Postgres storage is required; filesystem fallbacks have been removed from the API/worker flows.
+- Queue jobs default to 15-minute timeout (image) or 30-minute (batch). Track job progress via `/jobs/{job_id}` or Redis CLI.
+- Improver run logs are persisted to Postgres (if configured) and mirrored to JSONL under `SCENE_RUN_LOG_DIR`.

notes.md CHANGED Viewed

@@ -2,3 +2,5 @@
 **Manually Clear the GPU Lock from REDIS:
  redis-cli -u "$REDIS_URL" DEL gpu:lock

 **Manually Clear the GPU Lock from REDIS:
  redis-cli -u "$REDIS_URL" DEL gpu:lock
+export INTERNAL_NOTIFY_SECRET='82d4acd547e449fe';

stream3r/utils/__pycache__/visual_utils.cpython-311.pyc CHANGED Viewed

Binary files a/stream3r/utils/__pycache__/visual_utils.cpython-311.pyc and b/stream3r/utils/__pycache__/visual_utils.cpython-311.pyc differ

stream3r/utils/visual_utils.py CHANGED Viewed

@@ -327,6 +327,7 @@ def predictions_to_glb(
     reinflate_seed: int | None = None,
     ceiling_percentile: float | None = None,
     ceiling_margin: float = 0.05,
 ) -> trimesh.Scene:
     """
     Converts predictions to a 3D scene represented as a GLB file.
@@ -364,6 +365,7 @@ def predictions_to_glb(
         reinflate_seed (Optional[int]): RNG seed for deterministic reinflation.
         ceiling_percentile (Optional[float]): Remove points above this Z percentile (0-100).
         ceiling_margin (float): Margin subtracted from percentile cutoff (meters).
     Returns:
         trimesh.Scene: Processed 3D scene containing point cloud and cameras
@@ -544,6 +546,20 @@ def predictions_to_glb(
                 colors_rgb = colors_rgb[keep_mask]
                 conf_used = conf_used[keep_mask]
     if effective_voxel_size is not None and voxel_after_conf and vertices_3d.size:
         before_count = vertices_3d.shape[0]
         vertices_3d, colors_rgb, conf_used = voxel_reduce(

     reinflate_seed: int | None = None,
     ceiling_percentile: float | None = None,
     ceiling_margin: float = 0.05,
+    ceiling_z_max: float | None = None,
 ) -> trimesh.Scene:
     """
     Converts predictions to a 3D scene represented as a GLB file.
         reinflate_seed (Optional[int]): RNG seed for deterministic reinflation.
         ceiling_percentile (Optional[float]): Remove points above this Z percentile (0-100).
         ceiling_margin (float): Margin subtracted from percentile cutoff (meters).
+        ceiling_z_max (Optional[float]): Remove points with Z >= this absolute height (meters).
     Returns:
         trimesh.Scene: Processed 3D scene containing point cloud and cameras
                 colors_rgb = colors_rgb[keep_mask]
                 conf_used = conf_used[keep_mask]
+    if ceiling_z_max is not None and vertices_3d.size:
+        try:
+            z_limit = float(ceiling_z_max)
+        except (TypeError, ValueError):
+            z_limit = None
+        if z_limit is not None:
+            keep_mask = vertices_3d[:, 2] < z_limit
+            if not np.any(keep_mask):
+                keep_mask = vertices_3d[:, 2] <= z_limit
+            if np.any(keep_mask) and np.count_nonzero(keep_mask) < vertices_3d.shape[0]:
+                vertices_3d = vertices_3d[keep_mask]
+                colors_rgb = colors_rgb[keep_mask]
+                conf_used = conf_used[keep_mask]
     if effective_voxel_size is not None and voxel_after_conf and vertices_3d.size:
         before_count = vertices_3d.shape[0]
         vertices_3d, colors_rgb, conf_used = voxel_reduce(

stream3r/worker/__init__.py CHANGED Viewed

@@ -8,9 +8,10 @@ _settings = WorkerSettings.from_env()
 if _settings.default_job_timeout and _settings.default_job_timeout > 0:
     Queue.DEFAULT_TIMEOUT = _settings.default_job_timeout
-from .tasks import model_build_job, pose_pointmap_job  # noqa: E402
 __all__ = [
     "pose_pointmap_job",
     "model_build_job",
 ]

 if _settings.default_job_timeout and _settings.default_job_timeout > 0:
     Queue.DEFAULT_TIMEOUT = _settings.default_job_timeout
+from .tasks import keyframe_selection_job, model_build_job, pose_pointmap_job  # noqa: E402
 __all__ = [
     "pose_pointmap_job",
     "model_build_job",
+    "keyframe_selection_job",
 ]

stream3r/worker/config.py CHANGED Viewed

@@ -68,6 +68,7 @@ class WorkerSettings:
     pose_queue: str = "pose_pointmap"
     model_queue: str = "model_build"
     gpu_lock_key: str = "gpu:lock"
     gpu_lock_timeout: int = 3600
@@ -113,6 +114,7 @@ class WorkerSettings:
     scene_media_api_base_url: str | None = None
     scene_media_api_token: str | None = None
     scene_media_page_size: int = 200
     stream_window_size: int = 14
     max_frames_per_job: int = 0
@@ -128,7 +130,10 @@ class WorkerSettings:
     keyframe_coverage_voxel_size: float = 0.05
     keyframe_coverage_max_points: int = 5000
     keyframe_min_gain_ratio: float = 0.01
-    keyframe_full_mode_max_frames: int = 16
     @classmethod
     def from_env(cls) -> "WorkerSettings":
@@ -143,6 +148,7 @@ class WorkerSettings:
             ),
             "pose_queue": os.getenv("STREAM3R_QUEUE_POSE", base.pose_queue),
             "model_queue": os.getenv("STREAM3R_QUEUE_MODEL", base.model_queue),
             "gpu_lock_key": os.getenv("STREAM3R_GPU_LOCK_KEY", base.gpu_lock_key),
             "gpu_lock_timeout": _env_int("STREAM3R_GPU_LOCK_TIMEOUT", base.gpu_lock_timeout),
             "gpu_lock_blocking_timeout": _env_int(
@@ -212,6 +218,13 @@ class WorkerSettings:
                 default=base.scene_media_api_token,
             )
             or None,
             "scene_media_page_size": _env_int(
                 "STREAM3R_MEDIA_PAGE_SIZE", base.scene_media_page_size
             ),
@@ -260,6 +273,15 @@ class WorkerSettings:
             "keyframe_full_mode_max_frames": _env_int(
                 "STREAM3R_KEYFRAME_FULL_MAX_FRAMES", base.keyframe_full_mode_max_frames
             ),
         }
         return cls(**kwargs)

     pose_queue: str = "pose_pointmap"
     model_queue: str = "model_build"
+    keyframe_queue: str = "keyframe_selection"
     gpu_lock_key: str = "gpu:lock"
     gpu_lock_timeout: int = 3600
     scene_media_api_base_url: str | None = None
     scene_media_api_token: str | None = None
+    scene_media_api_secret: str | None = None
     scene_media_page_size: int = 200
     stream_window_size: int = 14
     max_frames_per_job: int = 0
     keyframe_coverage_voxel_size: float = 0.05
     keyframe_coverage_max_points: int = 5000
     keyframe_min_gain_ratio: float = 0.01
+    keyframe_full_mode_max_frames: int = 24
+    keyframe_extract_fps: float = 6.0
+    keyframe_extract_max_frames: int = 1200
+    keyframe_upload_dir: str = "images/keyframes"
     @classmethod
     def from_env(cls) -> "WorkerSettings":
             ),
             "pose_queue": os.getenv("STREAM3R_QUEUE_POSE", base.pose_queue),
             "model_queue": os.getenv("STREAM3R_QUEUE_MODEL", base.model_queue),
+            "keyframe_queue": os.getenv("STREAM3R_QUEUE_KEYFRAME", base.keyframe_queue),
             "gpu_lock_key": os.getenv("STREAM3R_GPU_LOCK_KEY", base.gpu_lock_key),
             "gpu_lock_timeout": _env_int("STREAM3R_GPU_LOCK_TIMEOUT", base.gpu_lock_timeout),
             "gpu_lock_blocking_timeout": _env_int(
                 default=base.scene_media_api_token,
             )
             or None,
+            "scene_media_api_secret": _env_value(
+                "STREAM3R_MEDIA_API_SECRET",
+                "MEDIA_API_SECRET",
+                "INTERNAL_NOTIFY_SECRET",
+                default=base.scene_media_api_secret,
+            )
+            or None,
             "scene_media_page_size": _env_int(
                 "STREAM3R_MEDIA_PAGE_SIZE", base.scene_media_page_size
             ),
             "keyframe_full_mode_max_frames": _env_int(
                 "STREAM3R_KEYFRAME_FULL_MAX_FRAMES", base.keyframe_full_mode_max_frames
             ),
+            "keyframe_extract_fps": float(
+                os.getenv("STREAM3R_KEYFRAME_EXTRACT_FPS", base.keyframe_extract_fps)
+            ),
+            "keyframe_extract_max_frames": _env_int(
+                "STREAM3R_KEYFRAME_EXTRACT_MAX_FRAMES", base.keyframe_extract_max_frames
+            ),
+            "keyframe_upload_dir": os.getenv(
+                "STREAM3R_KEYFRAME_UPLOAD_DIR", base.keyframe_upload_dir
+            ),
         }
         return cls(**kwargs)

stream3r/worker/keyframes.py ADDED Viewed

	@@ -0,0 +1,408 @@

+"""Key frame selection utilities."""
+from __future__ import annotations
+import logging
+from dataclasses import dataclass, field
+from datetime import datetime
+from pathlib import Path
+from typing import Any, Iterable, Mapping
+import cv2
+import numpy as np
+from .config import WorkerSettings
+from .pipeline import run_stream3r_inference
+from .runtime import WorkerRuntime
+logger = logging.getLogger(__name__)
+@dataclass(slots=True)
+class FrameRecord:
+    index: int
+    frame_id: str
+    path: Path
+    source: str | None = None
+    timestamp: str | None = None
+    metadata: dict[str, Any] = field(default_factory=dict)
+@dataclass(slots=True)
+class KeyframeSelectionResult:
+    indices: list[int]
+    diagnostics: list[dict[str, Any]]
+    top_k: int
+def pose_confidence(predictions: Mapping[str, np.ndarray]) -> np.ndarray | None:
+    if "world_points_conf" in predictions:
+        return np.asarray(predictions["world_points_conf"], dtype=np.float32)
+    if "depth_conf" in predictions:
+        return np.asarray(predictions["depth_conf"], dtype=np.float32)
+    return None
+def _camera_poses(extrinsic: np.ndarray) -> tuple[np.ndarray, np.ndarray]:
+    matrices = np.asarray(extrinsic, dtype=np.float64)
+    if matrices.ndim != 3 or matrices.shape[1:] != (3, 4):
+        raise ValueError("Extrinsic array must have shape (N, 3, 4)")
+    count = matrices.shape[0]
+    rotations = np.empty((count, 3, 3), dtype=np.float64)
+    translations = np.empty((count, 3), dtype=np.float64)
+    for idx in range(count):
+        mat = np.eye(4, dtype=np.float64)
+        mat[:3, :4] = matrices[idx]
+        cam_to_world = np.linalg.inv(mat)
+        rotations[idx] = cam_to_world[:3, :3]
+        translations[idx] = cam_to_world[:3, 3]
+    return rotations, translations
+def _compute_motion_deltas(rotations: np.ndarray, translations: np.ndarray, rot_weight: float) -> np.ndarray:
+    count = rotations.shape[0]
+    deltas = np.zeros(count, dtype=np.float64)
+    if count <= 1:
+        return deltas
+    for idx in range(1, count):
+        delta_t = np.linalg.norm(translations[idx] - translations[idx - 1])
+        rel = rotations[idx - 1].T @ rotations[idx]
+        trace = np.clip((np.trace(rel) - 1.0) / 2.0, -1.0, 1.0)
+        delta_r = float(np.arccos(trace))
+        deltas[idx] = delta_t + rot_weight * delta_r
+    return deltas
+def _hash_quantized_voxels(coords: np.ndarray) -> np.ndarray:
+    coords = coords.astype(np.int64, copy=False)
+    primes = np.array([73856093, 19349663, 83492791], dtype=np.int64)
+    return coords @ primes
+def _frame_voxel_sets(
+    world_points: np.ndarray,
+    confidence: np.ndarray,
+    *,
+    threshold: float,
+    voxel_size: float,
+    max_points: int,
+) -> tuple[list[set[int]], int]:
+    rng = np.random.default_rng(42)
+    frames = world_points.shape[0]
+    voxel_sets: list[set[int]] = []
+    global_union: set[int] = set()
+    if voxel_size <= 0.0:
+        return [set() for _ in range(frames)], 0
+    for idx in range(frames):
+        conf_frame = confidence[idx]
+        mask = conf_frame >= threshold
+        if not np.any(mask):
+            voxel_sets.append(set())
+            continue
+        points = world_points[idx][mask]
+        if points.shape[0] > max_points:
+            sample_idx = rng.choice(points.shape[0], max_points, replace=False)
+            points = points[sample_idx]
+        quantized = np.floor(points / voxel_size).astype(np.int64, copy=False)
+        hashes = np.unique(_hash_quantized_voxels(quantized))
+        voxel_set = set(int(v) for v in hashes.tolist())
+        voxel_sets.append(voxel_set)
+        global_union.update(voxel_set)
+    return voxel_sets, len(global_union)
+def _select_motion_indices(
+    motion_deltas: np.ndarray,
+    *,
+    threshold: float,
+    min_gap: int,
+    max_gap: int,
+) -> tuple[list[int], dict[int, dict[str, float]]]:
+    total_frames = motion_deltas.shape[0]
+    if total_frames == 0:
+        return [], {}
+    selected = [0]
+    diagnostics: dict[int, dict[str, float]] = {0: {"motion_delta": 0.0, "cum_motion": 0.0}}
+    cumulative = 0.0
+    gap = 0
+    for idx in range(1, total_frames):
+        delta = float(motion_deltas[idx])
+        cumulative += delta
+        gap += 1
+        if gap < max(1, min_gap):
+            continue
+        should_select = cumulative >= threshold
+        if max_gap > 0 and gap >= max_gap:
+            should_select = True
+        if should_select:
+            selected.append(idx)
+            diagnostics[idx] = {"motion_delta": delta, "cum_motion": cumulative}
+            cumulative = 0.0
+            gap = 0
+    if selected[-1] != total_frames - 1:
+        selected.append(total_frames - 1)
+        diagnostics.setdefault(total_frames - 1, {"motion_delta": float(motion_deltas[-1]), "cum_motion": cumulative})
+    return selected, diagnostics
+def select_keyframes_motion_coverage(
+    frame_records: list[FrameRecord],
+    predictions: Mapping[str, np.ndarray],
+    settings: WorkerSettings,
+    requested_top_k: int,
+) -> KeyframeSelectionResult | None:
+    extrinsic = np.asarray(predictions.get("extrinsic"))
+    if extrinsic.size == 0:
+        return None
+    rotations, translations = _camera_poses(extrinsic)
+    motion_deltas = _compute_motion_deltas(rotations, translations, settings.keyframe_rotation_weight)
+    motion_indices, motion_diag = _select_motion_indices(
+        motion_deltas,
+        threshold=settings.keyframe_motion_threshold,
+        min_gap=max(1, settings.keyframe_min_gap_frames),
+        max_gap=max(0, settings.keyframe_max_gap_frames),
+    )
+    total_frames = len(frame_records)
+    confidence = pose_confidence(predictions)
+    world_points = predictions.get("world_points")
+    if world_points is None:
+        world_points = predictions.get("world_points_from_depth")
+    voxel_sets: list[set[int]] = [set() for _ in range(total_frames)]
+    total_voxels = 0
+    mean_conf = np.zeros(total_frames, dtype=np.float32)
+    if confidence is not None:
+        mean_conf = confidence.reshape(confidence.shape[0], -1).mean(axis=1)
+    if confidence is not None and world_points is not None:
+        voxel_sets, total_voxels = _frame_voxel_sets(
+            np.asarray(world_points),
+            np.asarray(confidence),
+            threshold=settings.keyframe_coverage_confidence,
+            voxel_size=settings.keyframe_coverage_voxel_size,
+            max_points=max(1000, settings.keyframe_coverage_max_points),
+        )
+    total_voxels = max(total_voxels, 1)
+    top_k = requested_top_k if requested_top_k > 0 else settings.keyframe_default_top_k
+    top_k = max(min(top_k, total_frames), len(motion_indices))
+    selected_set: set[int] = set(motion_indices)
+    diagnostics: dict[int, dict[str, Any]] = {}
+    covered: set[int] = set()
+    for idx in motion_indices:
+        gain_count = len(voxel_sets[idx] - covered) if voxel_sets[idx] else 0
+        gain_ratio = gain_count / total_voxels
+        covered.update(voxel_sets[idx])
+        diagnostics[idx] = {
+            "frame_id": frame_records[idx].frame_id,
+            "frame_index": frame_records[idx].index,
+            "reason": "motion",
+            "motion_delta": float(motion_deltas[idx]),
+            "cum_motion": float(motion_diag.get(idx, {}).get("cum_motion", 0.0)),
+            "coverage_gain_ratio": float(gain_ratio),
+            "coverage_gain_count": int(gain_count),
+            "mean_confidence": float(mean_conf[idx]) if confidence is not None else None,
+        }
+    if len(selected_set) < top_k and total_voxels > 0:
+        min_gain_ratio = settings.keyframe_min_gain_ratio
+        remaining = [i for i in range(total_frames) if i not in selected_set and voxel_sets[i]]
+        while remaining and len(selected_set) < top_k:
+            best_idx = -1
+            best_gain = -1
+            best_ratio = -1.0
+            for idx in remaining:
+                gain = len(voxel_sets[idx] - covered)
+                if gain <= 0:
+                    continue
+                ratio = gain / total_voxels
+                if ratio > best_ratio or (np.isclose(ratio, best_ratio) and gain > best_gain):
+                    best_idx = idx
+                    best_gain = gain
+                    best_ratio = ratio
+            if best_idx == -1 or best_ratio < min_gain_ratio:
+                break
+            selected_set.add(best_idx)
+            covered.update(voxel_sets[best_idx])
+            diagnostics[best_idx] = {
+                "frame_id": frame_records[best_idx].frame_id,
+                "frame_index": frame_records[best_idx].index,
+                "reason": "coverage",
+                "motion_delta": float(motion_deltas[best_idx]),
+                "cum_motion": float(motion_diag.get(best_idx, {}).get("cum_motion", 0.0)),
+                "coverage_gain_ratio": float(best_ratio),
+                "coverage_gain_count": int(best_gain),
+                "mean_confidence": float(mean_conf[best_idx]) if confidence is not None else None,
+            }
+            remaining.remove(best_idx)
+    if requested_top_k > 0 and len(selected_set) > requested_top_k:
+        coverage_candidates = [idx for idx in selected_set if diagnostics[idx]["reason"] == "coverage"]
+        coverage_candidates.sort(key=lambda idx: diagnostics[idx].get("coverage_gain_ratio", 0.0))
+        while len(selected_set) > requested_top_k and coverage_candidates:
+            drop_idx = coverage_candidates.pop(0)
+            selected_set.remove(drop_idx)
+            diagnostics.pop(drop_idx, None)
+    final_indices = sorted(selected_set)
+    final_diags = [diagnostics[idx] for idx in final_indices]
+    return KeyframeSelectionResult(indices=final_indices, diagnostics=final_diags, top_k=len(final_indices))
+def run_keyframe_prepass(
+    *,
+    runtime: WorkerRuntime,
+    payload: Mapping[str, Any],
+    frame_records: list[FrameRecord],
+    mode: str,
+    streaming: bool,
+    window_size: int | None,
+) -> KeyframeSelectionResult | None:
+    if len(frame_records) <= 1:
+        return None
+    settings = runtime.settings
+    top_k_payload = int(payload.get("prepass_top_k") or payload.get("top_k_frames") or payload.get("top_k") or 0)
+    try:
+        inference = run_stream3r_inference(
+            runtime=runtime,
+            image_paths=[record.path for record in frame_records],
+            mode=mode,
+            streaming=streaming,
+            cache_output_path=None,
+            progress_cb=None,
+            window_size=window_size if streaming and mode == "window" else None,
+        )
+    except Exception:
+        logger.exception("Keyframe pre-pass inference failed")
+        return None
+    try:
+        return select_keyframes_motion_coverage(
+            frame_records,
+            inference.predictions,
+            settings,
+            requested_top_k=top_k_payload,
+        )
+    finally:
+        del inference
+def extract_video_frames(
+    video_path: Path,
+    output_dir: Path,
+    *,
+    target_fps: float | None = None,
+    max_frames: int | None = None,
+) -> tuple[list[FrameRecord], float]:
+    if not video_path.exists():
+        raise FileNotFoundError(f"Video file not found: {video_path}")
+    output_dir.mkdir(parents=True, exist_ok=True)
+    cap = cv2.VideoCapture(str(video_path))
+    if not cap.isOpened():
+        raise RuntimeError(f"Failed to open video: {video_path}")
+    native_fps = cap.get(cv2.CAP_PROP_FPS)
+    if not native_fps or native_fps <= 0:
+        native_fps = 30.0
+    frame_interval = 1
+    if target_fps and target_fps > 0:
+        frame_interval = max(1, int(round(native_fps / target_fps)))
+    frame_records: list[FrameRecord] = []
+    total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT) or 0)
+    frame_idx = 0
+    extracted = 0
+    success, frame = cap.read()
+    while success:
+        if frame_idx % frame_interval == 0:
+            frame_id = f"frame_{extracted:06d}"
+            frame_path = output_dir / f"{frame_id}.jpg"
+            if not cv2.imwrite(str(frame_path), frame):
+                cap.release()
+                raise RuntimeError(f"Failed to write frame: {frame_path}")
+            timestamp_s = frame_idx / native_fps
+            frame_records.append(
+                FrameRecord(
+                    index=extracted,
+                    frame_id=frame_id,
+                    path=frame_path,
+                    metadata={"frame_number": frame_idx, "timestamp_s": timestamp_s},
+                )
+            )
+            extracted += 1
+            if max_frames and extracted >= max_frames:
+                break
+        frame_idx += 1
+        success, frame = cap.read()
+    cap.release()
+    if not frame_records:
+        raise RuntimeError("No frames extracted from video")
+    return frame_records, native_fps
+def linear_sample_indices(total: int, desired: int) -> list[int]:
+    if desired <= 0 or total <= desired:
+        return list(range(total))
+    step = total / desired
+    return [min(total - 1, int(round(i * step))) for i in range(desired)]
+def build_keyframe_uploads(
+    runtime: WorkerRuntime,
+    scene_id: str,
+    selected_records: Iterable[FrameRecord],
+    diagnostics: list[dict[str, Any]],
+    *,
+    subdir: str,
+) -> tuple[list[dict[str, Any]], list[dict[str, Any]]]:
+    diag_by_index = {entry.get("frame_index"): entry for entry in diagnostics}
+    storage_entries: list[dict[str, Any]] = []
+    media_entries: list[dict[str, Any]] = []
+    for record in selected_records:
+        diag = diag_by_index.get(record.index, {})
+        filename = f"{record.frame_id}.jpg"
+        key = runtime.storage.build_key(scene_id, subdir, filename)
+        uri = runtime.storage.upload_file(record.path, key, content_type="image/jpeg")
+        storage_entries.append(
+            {
+                "frame_id": record.frame_id,
+                "frame_index": record.index,
+                "url": uri,
+                "storage_key": key,
+                "diagnostics": diag,
+            }
+        )
+        media_entries.append(
+            {
+                "media_type": "image",
+                "file": key,
+                "captured_at": _diagnostic_captured_at(record, diag),
+            }
+        )
+    return storage_entries, media_entries
+def _diagnostic_captured_at(record: FrameRecord, diag: Mapping[str, Any]) -> str | None:
+    if record.timestamp:
+        return record.timestamp
+    ts = diag.get("timestamp") or record.metadata.get("timestamp")
+    if isinstance(ts, str):
+        return ts
+    if isinstance(ts, (int, float)):
+        return datetime.utcfromtimestamp(float(ts)).isoformat() + "Z"
+    timestamp_s = record.metadata.get("timestamp_s")
+    if isinstance(timestamp_s, (int, float)):
+        return datetime.utcfromtimestamp(float(timestamp_s)).isoformat() + "Z"
+    return None

stream3r/worker/main.py CHANGED Viewed

@@ -77,7 +77,7 @@ def main() -> None:
     if settings.default_job_timeout and settings.default_job_timeout > 0:
         Queue.DEFAULT_TIMEOUT = settings.default_job_timeout
-    args = _parse_args([settings.pose_queue, settings.model_queue])
     logging.basicConfig(level=getattr(logging, str(args.log_level).upper(), logging.INFO))
     runtime = get_runtime()

     if settings.default_job_timeout and settings.default_job_timeout > 0:
         Queue.DEFAULT_TIMEOUT = settings.default_job_timeout
+    args = _parse_args([settings.pose_queue, settings.model_queue, settings.keyframe_queue])
     logging.basicConfig(level=getattr(logging, str(args.log_level).upper(), logging.INFO))
     runtime = get_runtime()

stream3r/worker/tasks.py CHANGED Viewed

@@ -11,7 +11,6 @@ import shutil
 import tempfile
 import traceback
 import uuid
-from dataclasses import dataclass, field
 from datetime import datetime, timezone
 from pathlib import Path
 from contextlib import nullcontext
@@ -24,6 +23,15 @@ from rq import get_current_job
 from stream3r.utils.visual_utils import predictions_to_glb
 from .pipeline import InferenceResult, run_stream3r_inference
 from .runtime import WorkerRuntime, get_runtime
@@ -51,25 +59,6 @@ def _as_int(value: Any, default: int) -> int:
         return int(value)
     except (TypeError, ValueError):
         return default
-@dataclass(slots=True)
-class FrameRecord:
-    index: int
-    frame_id: str
-    path: Path
-    source: str | None = None
-    timestamp: str | None = None
-    metadata: dict[str, Any] = field(default_factory=dict)
-@dataclass(slots=True)
-class KeyframeSelectionResult:
-    indices: list[int]
-    diagnostics: list[dict[str, Any]]
-    top_k: int
 class ProgressTracker:
     """Aggregates frame progress to percentage updates."""
@@ -124,6 +113,53 @@ def _write_base64(content: str, destination: Path) -> None:
     destination.write_bytes(data)
 def _resolve_frame_entry(entry: Any, *, index: int, dest_dir: Path) -> FrameRecord:
     metadata: dict[str, Any] = {}
     timestamp = None
@@ -364,16 +400,6 @@ def _collect_frames_from_scene_media(
         offset += request_limit
     return records
-def _pose_confidence(predictions: Mapping[str, np.ndarray]) -> np.ndarray | None:
-    if "world_points_conf" in predictions:
-        return np.asarray(predictions["world_points_conf"], dtype=np.float32)
-    if "depth_conf" in predictions:
-        return np.asarray(predictions["depth_conf"], dtype=np.float32)
-    return None
 def _save_pointmaps(
     *,
     runtime: WorkerRuntime,
@@ -389,7 +415,7 @@ def _save_pointmaps(
         raise RuntimeError("Predictions missing world points")
     world_points = np.asarray(world_points)
-    confidence = _pose_confidence(predictions)
     if confidence is None:
         confidence = np.ones(world_points.shape[:-1], dtype=np.float32)
@@ -669,7 +695,7 @@ def _select_keyframes_motion_coverage(
         max_gap=max(0, settings.keyframe_max_gap_frames),
     )
     total_frames = len(frame_records)
-    confidence = _pose_confidence(predictions)
     world_points = predictions.get("world_points")
     if world_points is None:
         world_points = predictions.get("world_points_from_depth")
@@ -756,7 +782,7 @@ def _compute_selected_frames(
 ) -> list[dict[str, Any]]:
     if top_k <= 0:
         return []
-    confidence = _pose_confidence(predictions)
     if confidence is None:
         return []
     scores = confidence.reshape(confidence.shape[0], -1).mean(axis=1)
@@ -831,6 +857,11 @@ def _save_scene_glb(
         ceiling_margin_value = float(ceiling_margin_value) if ceiling_margin_value is not None else 0.05
     except (TypeError, ValueError):
         ceiling_margin_value = 0.05
     scene = predictions_to_glb(
         dict(predictions),
@@ -844,6 +875,7 @@ def _save_scene_glb(
         prediction_mode=payload.get("prediction_mode", "Predicted Pointmap"),
         ceiling_percentile=ceiling_percentile_value,
         ceiling_margin=ceiling_margin_value,
     )
     scene.export(file_obj=str(local_file))
     key = runtime.storage.build_key(
@@ -1302,6 +1334,174 @@ def model_build_job(payload: Mapping[str, Any]) -> dict[str, Any]:
     return _execute_job("model_build", payload, _handle_model_build)
 def _handle_model_build(
     *,
     runtime: WorkerRuntime,

 import tempfile
 import traceback
 import uuid
 from datetime import datetime, timezone
 from pathlib import Path
 from contextlib import nullcontext
 from stream3r.utils.visual_utils import predictions_to_glb
+from .keyframes import (
+    FrameRecord,
+    KeyframeSelectionResult,
+    build_keyframe_uploads,
+    extract_video_frames,
+    linear_sample_indices,
+    pose_confidence,
+    run_keyframe_prepass,
+)
 from .pipeline import InferenceResult, run_stream3r_inference
 from .runtime import WorkerRuntime, get_runtime
         return int(value)
     except (TypeError, ValueError):
         return default
 class ProgressTracker:
     """Aggregates frame progress to percentage updates."""
     destination.write_bytes(data)
+def _register_scene_media_entries(runtime: WorkerRuntime, scene_id: str, entries: list[dict[str, Any]]) -> None:
+    if not entries:
+        return
+    base_url = runtime.settings.scene_media_api_base_url
+    if not base_url:
+        logger.info("Scene media API base URL not configured; skipping registration for %s", scene_id)
+        return
+    url = f"{base_url.rstrip('/')}/scenes/{scene_id}/media"
+    headers: dict[str, str] = {"Content-Type": "application/json"}
+    token = runtime.settings.scene_media_api_token
+    if token:
+        headers["Authorization"] = f"Bearer {token}"
+    secret = runtime.settings.scene_media_api_secret
+    if secret:
+        headers["x-internal-secret"] = secret
+    try:
+        response = requests.post(url, json={"entries": entries}, headers=headers, timeout=30)
+        if response.status_code == 405:
+            logger.info(
+                "Scene media API does not accept POST at %s (status %s); skipping registration",
+                url,
+                response.status_code,
+            )
+            return
+        response.raise_for_status()
+    except requests.HTTPError as exc:
+        status = exc.response.status_code if exc.response is not None else None
+        if status == 422:
+            logger.warning(
+                "Scene media API rejected payload for scene %s (422): %s",
+                scene_id,
+                exc.response.text if exc.response is not None else "",
+            )
+            return
+        if status == 500:
+            logger.warning(
+                "Scene media API encountered server error (500) for scene %s; skipping registration",
+                scene_id,
+            )
+            return
+        logger.exception("Failed to register scene media entries for scene %s", scene_id)
+    except requests.RequestException:
+        logger.exception("Failed to register scene media entries for scene %s", scene_id)
 def _resolve_frame_entry(entry: Any, *, index: int, dest_dir: Path) -> FrameRecord:
     metadata: dict[str, Any] = {}
     timestamp = None
         offset += request_limit
     return records
 def _save_pointmaps(
     *,
     runtime: WorkerRuntime,
         raise RuntimeError("Predictions missing world points")
     world_points = np.asarray(world_points)
+    confidence = pose_confidence(predictions)
     if confidence is None:
         confidence = np.ones(world_points.shape[:-1], dtype=np.float32)
         max_gap=max(0, settings.keyframe_max_gap_frames),
     )
     total_frames = len(frame_records)
+    confidence = pose_confidence(predictions)
     world_points = predictions.get("world_points")
     if world_points is None:
         world_points = predictions.get("world_points_from_depth")
 ) -> list[dict[str, Any]]:
     if top_k <= 0:
         return []
+    confidence = pose_confidence(predictions)
     if confidence is None:
         return []
     scores = confidence.reshape(confidence.shape[0], -1).mean(axis=1)
         ceiling_margin_value = float(ceiling_margin_value) if ceiling_margin_value is not None else 0.05
     except (TypeError, ValueError):
         ceiling_margin_value = 0.05
+    ceiling_z_max = payload.get("ceiling_z_max")
+    try:
+        ceiling_z_max_value = float(ceiling_z_max) if ceiling_z_max is not None else None
+    except (TypeError, ValueError):
+        ceiling_z_max_value = None
     scene = predictions_to_glb(
         dict(predictions),
         prediction_mode=payload.get("prediction_mode", "Predicted Pointmap"),
         ceiling_percentile=ceiling_percentile_value,
         ceiling_margin=ceiling_margin_value,
+        ceiling_z_max=ceiling_z_max_value,
     )
     scene.export(file_obj=str(local_file))
     key = runtime.storage.build_key(
     return _execute_job("model_build", payload, _handle_model_build)
+def _fallback_selection(frame_records: list[FrameRecord], top_k: int) -> KeyframeSelectionResult:
+    indices = linear_sample_indices(len(frame_records), top_k)
+    diagnostics = [
+        {
+            "frame_id": frame_records[idx].frame_id,
+            "frame_index": frame_records[idx].index,
+            "reason": "linear",
+        }
+        for idx in indices
+    ]
+    return KeyframeSelectionResult(indices=indices, diagnostics=diagnostics, top_k=len(indices))
+def keyframe_selection_job(payload: Mapping[str, Any]) -> dict[str, Any]:
+    runtime = get_runtime()
+    job = get_current_job()
+    payload = dict(payload)
+    job_id = str(payload.get("job_id") or (job.id if job else uuid.uuid4()))
+    scene_id = payload.get("scene_id")
+    if not scene_id:
+        raise ValueError("Keyframe job payload is missing 'scene_id'")
+    video_key = payload.get("video_key")
+    if not video_key:
+        raise ValueError("Keyframe job payload is missing 'video_key'")
+    job_type = "keyframe_selection"
+    job_meta = {
+        "job_id": job_id,
+        "job_type": job_type,
+        "scene_id": scene_id,
+    }
+    sanitized_payload = {
+        "scene_id": scene_id,
+        "video_key": video_key,
+        "top_k": payload.get("top_k"),
+        "extract_fps": payload.get("extract_fps"),
+        "extract_max_frames": payload.get("extract_max_frames"),
+    }
+    runtime.db.upsert_job(
+        job_id=job_id,
+        job_type=job_type,
+        scene_id=scene_id,
+        status="started",
+        payload=sanitized_payload,
+    )
+    runtime_emit(
+        runtime,
+        {
+            **job_meta,
+            "status": "started",
+            "progress": 0,
+            "ts": datetime.now(timezone.utc).timestamp(),
+        },
+    )
+    start_time = perf_counter()
+    try:
+        with tempfile.TemporaryDirectory(prefix=f"keyframe_{job_id}_") as tmp_dir:
+            temp_path = Path(tmp_dir)
+            video_path = temp_path / "input_video"
+            runtime.storage.download_to_path(video_key, video_path)
+            extract_fps = payload.get("extract_fps")
+            try:
+                extract_fps_value = float(extract_fps) if extract_fps is not None else runtime.settings.keyframe_extract_fps
+            except (TypeError, ValueError):
+                extract_fps_value = runtime.settings.keyframe_extract_fps
+            max_frames = _as_int(
+                payload.get("extract_max_frames"),
+                runtime.settings.keyframe_extract_max_frames,
+            )
+            frame_records, native_fps = extract_video_frames(
+                video_path,
+                temp_path / "frames",
+                target_fps=extract_fps_value,
+                max_frames=max_frames,
+            )
+            selection = run_keyframe_prepass(
+                runtime=runtime,
+                payload=payload,
+                frame_records=frame_records,
+                mode="window",
+                streaming=True,
+                window_size=runtime.settings.stream_window_size,
+            )
+            if selection is None or not selection.indices:
+                requested_top_k = _as_int(payload.get("top_k"), runtime.settings.keyframe_default_top_k)
+                selection = _fallback_selection(frame_records, requested_top_k)
+            selected_records = [frame_records[i] for i in selection.indices]
+            storage_entries, media_entries = build_keyframe_uploads(
+                runtime,
+                scene_id,
+                selected_records,
+                selection.diagnostics,
+                subdir=runtime.settings.keyframe_upload_dir,
+            )
+            _register_scene_media_entries(runtime, scene_id, media_entries)
+            result_payload = {
+                "job_id": job_id,
+                "job_type": job_type,
+                "scene_id": scene_id,
+                "video_key": video_key,
+                "native_fps": native_fps,
+                "total_frames": len(frame_records),
+                "selected_frames": storage_entries,
+                "selection": selection.diagnostics,
+            }
+    except Exception as exc:
+        error_text = traceback.format_exc()
+        runtime.db.upsert_job(
+            job_id=job_id,
+            job_type=job_type,
+            scene_id=scene_id,
+            status="failed",
+            error=error_text,
+        )
+        runtime_emit(
+            runtime,
+            {
+                **job_meta,
+                "status": "failed",
+                "ts": datetime.now(timezone.utc).timestamp(),
+                "error": str(exc),
+            },
+        )
+        logger.exception("Keyframe selection job %s failed", job_id)
+        raise
+    runtime.db.upsert_job(
+        job_id=job_id,
+        job_type=job_type,
+        scene_id=scene_id,
+        status="finished",
+        result=result_payload,
+    )
+    runtime_emit(
+        runtime,
+        {
+            **job_meta,
+            "status": "finished",
+            "progress": 100,
+            "ts": datetime.now(timezone.utc).timestamp(),
+        },
+    )
+    logger.info(
+        "Keyframe selection job %s finished in %.2fs (selected %d/%d frames)",
+        job_id,
+        perf_counter() - start_time,
+        len(selection.indices),
+        len(frame_records),
+    )
+    return result_payload
 def _handle_model_build(
     *,
     runtime: WorkerRuntime,

tests/test_voxel_reduction.py CHANGED Viewed

@@ -222,3 +222,47 @@ def test_predictions_to_glb_ceiling_filter():
     point_cloud = next(iter(scene.geometry.values()))
     max_z = point_cloud.vertices[:, 2].max()
     assert max_z < 1.6

     point_cloud = next(iter(scene.geometry.values()))
     max_z = point_cloud.vertices[:, 2].max()
     assert max_z < 1.6
+def test_predictions_to_glb_ceiling_absolute_cut():
+    world_points = np.array(
+        [
+            [
+                [[0.0, 0.0, 0.5], [0.0, 0.0, 1.0]],
+                [[0.0, 0.0, 1.2], [0.0, 0.0, 2.0]],
+            ]
+        ],
+        dtype=np.float32,
+    )
+    predictions = {
+        "world_points": world_points,
+        "world_points_conf": np.ones((1, 2, 2), dtype=np.float32),
+        "world_points_from_depth": world_points,
+        "depth_conf": np.ones((1, 2, 2), dtype=np.float32),
+        "images": np.ones((1, 2, 2, 3), dtype=np.float32) * 0.5,
+        "extrinsic": np.array(
+            [
+                [
+                    [1.0, 0.0, 0.0, 0.0],
+                    [0.0, 1.0, 0.0, 0.0],
+                    [0.0, 0.0, 1.0, 0.0],
+                ]
+            ],
+            dtype=np.float32,
+        ),
+    }
+    scene = predictions_to_glb(
+        predictions,
+        conf_thres=0.0,
+        voxel_size=None,
+        o3d_denoise=False,
+        density_filter=False,
+        reinflate_enabled=False,
+        ceiling_z_max=1.1,
+    )
+    assert isinstance(scene, trimesh.Scene)
+    point_cloud = next(iter(scene.geometry.values()))
+    max_z = point_cloud.vertices[:, 2].max()
+    assert max_z <= 1.1

worker/stream3r/jobs.py CHANGED Viewed

@@ -4,12 +4,13 @@ from __future__ import annotations
 from typing import Any, Callable, Mapping
-from stream3r.worker.tasks import model_build_job, pose_pointmap_job
 _HANDLERS: dict[str, Callable[[Mapping[str, Any]], Any]] = {
     "pose_pointmap": pose_pointmap_job,
     "model_build": model_build_job,
 }
@@ -42,7 +43,7 @@ def handle_job(*args: Any, **kwargs: Any) -> Any:
         if isinstance(candidate, Mapping):
             payload = candidate
-    if payload is None and isinstance(args[0], Mapping):
         payload = args[0]
         job_type = str(payload.get("job_type")) if payload else job_type
@@ -60,4 +61,3 @@ def handle_job(*args: Any, **kwargs: Any) -> Any:
         raise ValueError(f"Unsupported job_type '{job_type}'")
     return handler(payload)

 from typing import Any, Callable, Mapping
+from stream3r.worker.tasks import keyframe_selection_job, model_build_job, pose_pointmap_job
 _HANDLERS: dict[str, Callable[[Mapping[str, Any]], Any]] = {
     "pose_pointmap": pose_pointmap_job,
     "model_build": model_build_job,
+    "keyframe_selection": keyframe_selection_job,
 }
         if isinstance(candidate, Mapping):
             payload = candidate
+    if payload is None and args and isinstance(args[0], Mapping):
         payload = args[0]
         job_type = str(payload.get("job_type")) if payload else job_type
         raise ValueError(f"Unsupported job_type '{job_type}'")
     return handler(payload)