brian4dwell commited on
Commit
01e8928
·
1 Parent(s): de1bede

split key framer out

Browse files
design_docs/keyframe_service.md ADDED
@@ -0,0 +1,178 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ```markdown
2
+ # Keyframe Selection Service
3
+
4
+ **Author:** Brian Clark
5
+ **Last Updated:** 2025-11-07
6
+ **Audience:** Integrators needing motion-aware keyframes instead of linear sampling
7
+
8
+ ---
9
+
10
+ ## Overview
11
+
12
+ The keyframe selection worker ingests a raw video (Backblaze/S3 key), extracts frames, runs motion + coverage analysis, and uploads the chosen keyframes back to the media store while recording them in the `scene_media` table. Unlike linear FPS sampling, it keeps only the most informative views (~10–20 frames for a typical 30 s scan).
13
+
14
+ The workflow is exposed as an RQ job (`keyframe_selection`). Another service (e.g., scene-graph-manager) can enqueue this job instead of doing its own downsampling.
15
+
16
+ ---
17
+
18
+ ## Job Inputs
19
+
20
+ | Field | Required | Description |
21
+ |-------|----------|-------------|
22
+ | `scene_id` | ✅ | Scene identifier used in `scene_media` |
23
+ | `video_key` | ✅ | Storage key (Backblaze/S3) for the source video |
24
+ | `top_k` | optional | Desired maximum keyframes; defaults to `STREAM3R_KEYFRAME_TOP_K` |
25
+ | `extract_fps` | optional | Override extraction FPS (default `STREAM3R_KEYFRAME_EXTRACT_FPS`) |
26
+ | `extract_max_frames` | optional | Cap on total decoded frames (default `STREAM3R_KEYFRAME_EXTRACT_MAX_FRAMES`) |
27
+ | Optional filters | e.g., `ceiling_percentile`, `ceiling_z_max` for downstream GLB |
28
+
29
+ Example payload:
30
+
31
+ ```json
32
+ {
33
+ "job_type": "keyframe_selection",
34
+ "scene_id": "scene-123",
35
+ "video_key": "scene-data/videos/scene-123.mp4",
36
+ "top_k": 16,
37
+ "extract_fps": 6.0
38
+ }
39
+ ```
40
+
41
+ Enqueue via RQ:
42
+
43
+ ```python
44
+ from rq import Queue
45
+ from redis import Redis
46
+
47
+ queue = Queue("keyframe_selection", connection=Redis.from_url("redis://"))
48
+ queue.enqueue("worker.stream3r.jobs.handle_job", {
49
+ "job_type": "keyframe_selection",
50
+ "scene_id": "scene-123",
51
+ "video_key": "scene-data/videos/scene-123.mp4"
52
+ })
53
+ ```
54
+
55
+ ---
56
+
57
+ ## Processing Pipeline
58
+
59
+ 1. **Download & Extract Frames**
60
+ - Video pulled via `runtime.storage.download_to_path`.
61
+ - Frames decoded with OpenCV at `extract_fps` (default 6 fps) up to `extract_max_frames`.
62
+
63
+ 2. **Motion + Coverage Pre-pass**
64
+ - Lightweight Stream3R windowed inference run to collect poses (`extrinsic`) and confidences.
65
+ - Motion scoring: translation + weighted rotation deltas, greedily ensures pose diversity.
66
+ - Coverage scoring: counts high-confidence voxel IDs contributed per frame, greedily maximizes new coverage.
67
+ - Diagnostics stored per frame (reason, motion delta, coverage gain, confidence).
68
+ - If inference fails, fallback to linear sampling.
69
+
70
+ 3. **Selection & Upload**
71
+ - Selected frames copied; images uploaded via `runtime.storage.upload_file` under `keyframe_upload_dir` (default `keyframes`).
72
+ - Scene media rows inserted through `record_scene_media_entries` API.
73
+ - Manifest (`selected_frames`) returned with storage keys and diagnostics.
74
+
75
+ 4. **Result**
76
+ - Job metadata includes: native video FPS, total extracted frames, selected frame details, and diagnostics.
77
+ - `selected_frames.json` (optional) can be stored by downstream jobs for auditing.
78
+ - Scene-media registration is attempted via the configured API; if the endpoint does not accept POST (e.g., legacy deployments), the worker logs and skips registration without failing the job.
79
+
80
+ ---
81
+
82
+ ## Outputs & Diagnostics
83
+
84
+ Result payload (simplified):
85
+
86
+ ```json
87
+ {
88
+ "job_id": "...",
89
+ "scene_id": "scene-123",
90
+ "video_key": "scene-data/videos/scene-123.mp4",
91
+ "native_fps": 29.97,
92
+ "total_frames": 420,
93
+ "selected_frames": [
94
+ {
95
+ "frame_id": "frame_000012",
96
+ "frame_index": 12,
97
+ "url": "s3://bucket/scenes/scene-123/keyframes/frame_000012.jpg",
98
+ "storage_key": "scenes/scene-123/keyframes/frame_000012.jpg",
99
+ "diagnostics": {
100
+ "reason": "motion",
101
+ "motion_delta": 0.42,
102
+ "coverage_gain_ratio": 0.08,
103
+ "mean_confidence": 0.67
104
+ }
105
+ }
106
+ ]
107
+ }
108
+ ```
109
+
110
+ Each uploaded frame is also inserted/updated in the `scene_media` table via the Scene Graph API (`/scenes/{scene_id}/media`).
111
+
112
+ ---
113
+
114
+ ## Configuration
115
+
116
+ Environment variables for fine-tuning:
117
+
118
+ | Env Var | Default | Notes |
119
+ |---------|---------|-------|
120
+ | `STREAM3R_QUEUE_KEYFRAME` | `keyframe_selection` | RQ queue name |
121
+ | `STREAM3R_KEYFRAME_EXTRACT_FPS` | `6.0` | Extraction FPS |
122
+ | `STREAM3R_KEYFRAME_EXTRACT_MAX_FRAMES` | `1200` | Extraction cap |
123
+ | `STREAM3R_KEYFRAME_UPLOAD_DIR` | `keyframes` | Storage subdirectory |
124
+ | `STREAM3R_KEYFRAME_TOP_K` | `16` | Default selection budget |
125
+ | `STREAM3R_KEYFRAME_PREPASS` | `1` | Enable motion/coverage inference |
126
+ | `STREAM3R_KEYFRAME_MOTION_THRESH` | `0.4` | Motion threshold |
127
+ | `STREAM3R_KEYFRAME_ROT_WEIGHT` | `0.5` | Rotation weight |
128
+ | `STREAM3R_KEYFRAME_MIN_GAIN` | `0.01` | Min coverage gain |
129
+ | `STREAM3R_KEYFRAME_FULL_MAX_FRAMES` | `24` | Switch to full attention when below |
130
+
131
+ Scene media API requirements:
132
+ - `STREAM3R_MEDIA_API_BASE_URL`
133
+ - `STREAM3R_MEDIA_API_TOKEN` for authenticated inserts
134
+
135
+ Ceiling trimming (optional) can be set per job via `ceiling_percentile`, `ceiling_margin`, or `ceiling_z_max` so downstream GLBs remain clean.
136
+
137
+ ---
138
+
139
+ ## Integration Steps for External Services
140
+
141
+ 1. **Deploy Worker Queue**
142
+ - Run the Stream3R worker with `--queue keyframe_selection` (already default when env var set).
143
+ - Ensure GPU not required: pre-pass uses Stream3R; CPU-only environments should set `STREAM3R_MODEL_DEVICE=cpu` or schedule on GPU hosts.
144
+
145
+ 2. **Enqueue Jobs**
146
+ - Replace existing linear sampling code with an RQ enqueue call.
147
+ - Store job IDs if you need to poll job status or consume events.
148
+
149
+ 3. **Consume Results**
150
+ - After completion, list `scene_media` for `media_type=image` to retrieve new keyframe entries.
151
+ - Inspect returned diagnostics for debugging or to render navigation overlays.
152
+
153
+ 4. **Fallback Handling**
154
+ - If the job fails, the queue returns error details; you can revert to your existing sampler.
155
+ - Consider scheduling a retry with adjusted parameters (e.g., lower `top_k`).
156
+
157
+ ---
158
+
159
+ ## Benefits vs. Linear Sampling
160
+
161
+ - **Fewer redundant frames**: motion-aware spacing ensures pose diversity.
162
+ - **Better geometry coverage**: only keeps frames that add new high-confidence voxels.
163
+ - **Consistent diagnostics**: each selected frame includes reasons and confidence, aiding QA.
164
+ - **Automatic uploads**: frames stored in Backblaze/local storage with `scene_media` entries ready for viewers.
165
+
166
+ ---
167
+
168
+ ## Future Enhancements
169
+
170
+ - Optional semantic filtering to avoid ceilings/walls.
171
+ - Exposure of thumbnails or depth maps alongside keyframes.
172
+ - Batch selection across multiple videos.
173
+
174
+ ---
175
+
176
+ For questions or integration support, contact the Stream3R team or refer to `stream3r/worker/keyframes.py` for implementation details.
177
+
178
+ ```
docs/api_usage.md ADDED
@@ -0,0 +1,598 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Scene Graph Manager API Usage
2
+
3
+ This document gives a concise, high-signal overview of the HTTP API so other services can integrate without reading the whole codebase.
4
+
5
+ ## Refresh Checklist (for future updates)
6
+
7
+ Use this prompt when endpoints change:
8
+
9
+ > Run `rg "@app" -n api/app.py` and list new/modified routes. Update `docs/api_usage.md` with:
10
+ > - Endpoint table (method, path, summary)
11
+ > - Auth / header requirements
12
+ > - Request & response JSON samples (curl when useful)
13
+ > - Notes on query params, error behaviors, and streaming endpoints.
14
+ > Re-run `python -m compileall api/app.py` and ensure the doc still reflects reality.
15
+
16
+ ## Base URL
17
+
18
+ All paths below are relative to:
19
+
20
+ ```
21
+ https://scene-graph-mgr-api.fly.dev
22
+ ```
23
+
24
+ ## Authentication
25
+
26
+ - Public endpoints are unauthenticated but subject to Fly.io rate limits.
27
+ - Internal write endpoints require a shared secret header: `x-internal-secret: <secret>`.
28
+ - Standard error payload:
29
+
30
+ ```json
31
+ {
32
+ "detail": "Human readable message",
33
+ "error_code": "optional-machine-code"
34
+ }
35
+ ```
36
+
37
+ 4xx indicates caller issues (validation, missing scene). 5xx means server-side failure.
38
+
39
+ ## Endpoint Map
40
+
41
+ | Method | Path | Summary |
42
+ | --- | --- | --- |
43
+ | GET | `/healthz` | Liveness check. |
44
+ | GET | `/scenes` | List scenes with summary metadata (optional `include_empty`). |
45
+ | PUT | `/scenes/{scene_id}` | Overwrite a scene with a full graph payload. |
46
+ | PATCH | `/scenes/{scene_id}` | Apply an RFC 6902 patch to the latest graph. |
47
+ | POST | `/scenes/{scene_id}/create` | Seed an empty scene (optionally overwrite). |
48
+ | GET | `/scenes/{scene_id}/versions/latest` | Latest graph version. |
49
+ | GET | `/scenes/{scene_id}/versions` | Metadata for all versions. |
50
+ | GET | `/scenes/{scene_id}/versions/{version_id}` | Raw version record (JSON). |
51
+ | POST | `/scenes/{scene_id}/add_image` | Upload image bytes and enqueue processing. |
52
+ | POST | `/scenes/{scene_id}/add_images_from_keys` | Enqueue processing for existing S3 keys. |
53
+ | POST | `/scenes/{scene_id}/upload_video` | Upload video and queue frame extraction batch. |
54
+ | POST | `/scenes/{scene_id}/upload_image` | Browser upload → WebP resize → presigned URL. |
55
+ | GET | `/scenes/{scene_id}/instances` | List stored instances for the scene (status filters optional). |
56
+ | GET | `/scenes/{scene_id}/objects/{obj_id}/beliefs` | Latest improver belief for an object. |
57
+ | GET | `/scenes/{scene_id}/objects/{obj_id}/instances` | List stored instances for an object (status filters optional). |
58
+ | GET | `/scenes/{scene_id}/improver/backlog` | Inspect improver queue/backlog for a scene. |
59
+ | POST | `/scenes/{scene_id}/objects/{obj_id}/instances/{instance_id}/improve_instance` | Queue improver workflow for an instance. |
60
+ | GET | `/scenes/{scene_id}/runlogs` | Recent improver run logs (filterable). |
61
+ | GET | `/scenes/{scene_id}/change-requests` | Inspect change request queue (pending/applied). |
62
+ | GET | `/scenes/{scene_id}/change-summaries` | Stream applied change summaries (newest first). |
63
+ | GET | `/scenes/{scene_id}/diff` | Structural diff between two versions (machine patch & stats). |
64
+ | GET | `/scenes/{scene_id}/diff-semantic` | Semantic diff (ID-aware ops). |
65
+ | GET | `/scenes/{scene_id}/images/presign` | Generate presigned GET URL for an image key. |
66
+ | GET | `/scenes/{scene_id}/media` | List recorded media assets (images/videos). |
67
+ | POST | `/scenes/{scene_id}/media` | Upsert scene media entries supplied by workers or services. |
68
+ | GET | `/jobs/{job_id}` | Inspect queued/finished RQ jobs. |
69
+ | POST | `/stream3r/jobs` | Submit a reconstruction job (pose pointmap or model build). |
70
+ | GET | `/stream3r/jobs/{job_id}` | Inspect a Stream3R job record. |
71
+ | GET | `/stream3r/jobs` | List Stream3R jobs (filter by scene, type, status). |
72
+ | GET | `/stream3r/jobs/{job_id}/events` | Server-Sent Events feed for job lifecycle updates. |
73
+ | GET | `/stream3r/models/{scene_id}/presign` | Presign the latest model_build scene.glb for download. |
74
+ | POST | `/internal/commit-version` | Worker/internal scene commit (requires secret). |
75
+ | GET | `/debug/s3-ping` | Smoke-test S3/B2 credentials. |
76
+ | WS | `/ws?channel=all|{scene_id}` | Broadcasts scene events over WebSocket. |
77
+
78
+ ---
79
+
80
+ ## Endpoint Details & Examples
81
+
82
+ ### Health & Diagnostics
83
+
84
+ #### `GET /healthz`
85
+ Returns `{ "ok": true, "ts": "ISO timestamp" }` for uptime monitoring.
86
+
87
+ #### `GET /debug/s3-ping`
88
+ Verifies object storage connectivity using a put/delete round-trip. Good for smoke tests.
89
+
90
+ ### Scene Lifecycle
91
+
92
+ #### `GET /scenes`
93
+ List scenes. Optional `include_empty=true` includes scenes with no versions yet.
94
+
95
+ ```bash
96
+ curl "https://scene-graph-mgr-api.fly.dev/scenes?include_empty=true"
97
+ ```
98
+
99
+ Response contains summaries with counts and most recent version metadata.
100
+
101
+ #### `PUT /scenes/{scene_id}`
102
+ Replace entire scene graph.
103
+
104
+ Request body (`SceneGraphEnvelope`):
105
+ ```json
106
+ {
107
+ "scene_location_id": "scene-123",
108
+ "scene_graph": {"objects": [], "relations": []},
109
+ "meta": {"source": "manual"}
110
+ }
111
+ ```
112
+ Creates a new version record and broadcasts `scene.put` on WebSocket.
113
+
114
+ #### `PATCH /scenes/{scene_id}`
115
+ Apply RFC 6902 patch to latest graph. Optional `base_version` guard prevents lost updates.
116
+
117
+ ```json
118
+ {
119
+ "scene_location_id": "scene-123",
120
+ "json_patch": [
121
+ {"op": "add", "path": "/objects/-", "value": {"id": "chair-1", "attributes": {}}}
122
+ ],
123
+ "base_version": 1728400000000
124
+ }
125
+ ```
126
+
127
+ #### `POST /scenes/{scene_id}/create`
128
+ Seeds an empty graph. `overwrite=true` allows reseeding existing scenes.
129
+
130
+ #### Version Reads
131
+ - `GET /scenes/{scene_id}/versions/latest` → latest full graph.
132
+ - `GET /scenes/{scene_id}/versions` → list of `{version_id, created_at, bytes}`.
133
+ - `GET /scenes/{scene_id}/versions/{version_id}` → raw record (graph + metadata).
134
+
135
+ ### Image Upload & Processing
136
+
137
+ #### `POST /scenes/{scene_id}/add_image`
138
+ Multipart upload (`file=@image.jpg`). Stores bytes via S3-compatible API, seeds scene if empty, enqueues `worker.tasks.process_image_for_scene`. Response includes RQ `job_id` and filename.
139
+
140
+ #### `POST /scenes/{scene_id}/add_images_from_keys`
141
+ JSON body:
142
+ ```json
143
+ {
144
+ "keys": ["scenes/scene-123/images/20241008/a.png"],
145
+ "room_hint": "living_room",
146
+ "prompt": "describe objects",
147
+ "bounding_boxes": {"a.png": [[0.1,0.2,0.5,0.8]]}
148
+ }
149
+ ```
150
+ Seeds scene if needed and enqueues batch worker job.
151
+
152
+ Response (`AddImagesFromKeysResponse`):
153
+
154
+ ```json
155
+ {
156
+ "scene_location_id": "scene-123",
157
+ "queued_at": "2024-10-08T14:12:05Z",
158
+ "job_id": "rq-job-id",
159
+ "keys": [
160
+ "scenes/scene-123/images/20241008/a.png",
161
+ "scenes/scene-123/images/20241008/b.png"
162
+ ]
163
+ }
164
+ ```
165
+
166
+ #### `POST /scenes/{scene_id}/upload_image`
167
+ For browser uploads. Accepts multipart file, resizes to WebP (max width 1024), uploads, and returns:
168
+ ```json
169
+ {
170
+ "scene_location_id": "scene-123",
171
+ "key": "scenes/scene-123/images/20241008/abc.webp",
172
+ "url": "https://...presigned...",
173
+ "width": 768,
174
+ "height": 512,
175
+ "bytes": 123456
176
+ }
177
+ ```
178
+
179
+ #### `GET /scenes/{scene_id}/images/presign`
180
+ Query parameters: `key` (must reside under `scenes/{scene_id}/images/…` **or** `scenes/{scene_id}/videos/…`) and optional `expires`. Returns `{ "url": "...", "expires_in": 900 }`.
181
+
182
+ ### Video Upload & Frame Extraction
183
+
184
+ #### `POST /scenes/{scene_id}/upload_video`
185
+ Accepts a multipart video upload (e.g., `file=@walkthrough.mp4`). Stores the binary in object storage and seeds an empty scene when needed. The worker enqueues `process_video_for_scene`, which:
186
+
187
+ - extracts WebP frames at 2fps by default (`frame_interval=0.5` seconds) and stores them under `scenes/{scene_id}/images/video_frames/...` (or keeps the source under `scenes/{scene_id}/videos/...`) so the `/images/presign` endpoint can serve them;
188
+ - retries with a software H.264 transcode if AV1 or other codecs fail to decode on the host;
189
+ - publishes the same `scene.update` event stream as still-image uploads.
190
+
191
+ Response payload (`UploadVideoResponse`):
192
+
193
+ ```json
194
+ {
195
+ "scene_location_id": "scene-123",
196
+ "key": "scenes/scene-123/videos/20241008/abcd1234.mp4",
197
+ "filename": "walkthrough.mp4",
198
+ "size_bytes": 456789012,
199
+ "content_type": "video/mp4",
200
+ "queued_at": "2024-10-08T14:12:05Z",
201
+ "job_id": "rq-job-id"
202
+ }
203
+ ```
204
+ Clients should poll `GET /jobs/{job_id}` to track progress. When `job.result.frame_keys` is present the frame extraction succeeded.
205
+
206
+ ### Scene Media
207
+
208
+ #### `GET /scenes/{scene_id}/media`
209
+ Lists stored media records for a scene. Use `media_type=image` to filter to keyframes vs. `media_type=video` for source clips.
210
+
211
+ ```bash
212
+ curl "https://scene-graph-mgr-api.fly.dev/scenes/scene-123/media?media_type=image&limit=50"
213
+ ```
214
+
215
+ Response (`SceneMediaListResponse`) includes the total slice returned plus ISO timestamps normalized to UTC.
216
+
217
+ #### `POST /scenes/{scene_id}/media`
218
+ Upserts media entries (usually called by workers after uploading files). Payload:
219
+
220
+ ```json
221
+ {
222
+ "entries": [
223
+ {
224
+ "file": "scenes/scene-123/keyframes/frame_000012.jpg",
225
+ "media_type": "image",
226
+ "captured_at": "2024-10-08T14:12:05Z"
227
+ },
228
+ {
229
+ "file": "scenes/scene-123/videos/20241008/abcd1234.mp4",
230
+ "media_type": "video"
231
+ }
232
+ ]
233
+ }
234
+ ```
235
+
236
+ - `media_type` is optional—when omitted the server infers it from the filename (`image` vs `video`).
237
+ - `captured_at` accepts ISO-8601 strings (UTC preferred). If omitted, the server stores `now()`.
238
+ - Existing rows are updated in-place thanks to an upsert on the `file` column.
239
+
240
+ Response (`SceneMediaBatchResponse`) summarizes how many entries were accepted:
241
+
242
+ ```json
243
+ {
244
+ "scene_id": "scene-123",
245
+ "accepted": 2,
246
+ "skipped": 0,
247
+ "files": [
248
+ "scenes/scene-123/keyframes/frame_000012.jpg",
249
+ "scenes/scene-123/videos/20241008/abcd1234.mp4"
250
+ ]
251
+ }
252
+ ```
253
+
254
+ ### Improver Monitoring
255
+
256
+ #### `GET /scenes/{scene_id}/improver/backlog`
257
+ Summarizes outstanding improver work for a scene. Default statuses are `pending`, `queued`, and `processing`, plus records that do not yet have a status. Use this to detect when the improver has drained the backlog.
258
+
259
+ Query parameters:
260
+ - `limit` (default `200`, max `2000`) — cap the number of instances returned.
261
+ - `status` — optional list to override which statuses count as “not yet processed.” Omit to use the defaults.
262
+ - `include_missing_status` (default `true`) — include rows with `status IS NULL`.
263
+
264
+ Response:
265
+
266
+ ```json
267
+ {
268
+ "scene_id": "scene-123",
269
+ "count": 2,
270
+ "instances": [
271
+ {
272
+ "scene_id": "scene-123",
273
+ "obj_id": "obj-1",
274
+ "instance_id": "inst-42",
275
+ "status": "pending",
276
+ "status_reason": "ambient",
277
+ "status_changed_at": "2024-10-08T14:15:06Z",
278
+ "last_event_at": "2024-10-08T14:15:06Z",
279
+ "created_at": "2024-10-08T13:59:40Z",
280
+ "data": {
281
+ "image_id": "img-998",
282
+ "bbox_xyxy": [0.1, 0.2, 0.5, 0.8]
283
+ }
284
+ }
285
+ ]
286
+ }
287
+ ```
288
+ When `count` is zero the improver has no pending work for that scene.
289
+
290
+ ### Improver & Beliefs
291
+
292
+ #### `POST /scenes/{scene_id}/objects/{obj_id}/instances/{instance_id}/improve_instance`
293
+ Queues improver workflow. Body adheres to `InstanceEvent` schema. Example:
294
+
295
+ ```bash
296
+ curl -X POST \
297
+ -H "Content-Type: application/json" \
298
+ -d '{
299
+ "scene_id": "scene-123",
300
+ "obj_id": "sofa-1",
301
+ "instance_id": "inst-456",
302
+ "image_id": "scenes/scene-123/images/20241008/sofa.png",
303
+ "bbox_xyxy": [0.1, 0.3, 0.6, 0.9]
304
+ }' \
305
+ https://scene-graph-mgr-api.fly.dev/scenes/scene-123/objects/sofa-1/instances/inst-456/improve_instance
306
+ ```
307
+
308
+ Response: `{ "enqueued": true, "job_id": "rq-job-id" }`.
309
+
310
+ The worker embeds, seeds Qdrant, and calls `scene_improver.tasks.improve_scene`.
311
+
312
+ #### `GET /scenes/{scene_id}/objects/{obj_id}/beliefs`
313
+ Returns latest belief payload (name Dirichlet, attribute betas, relations) or 404 if none stored.
314
+
315
+ #### `GET /scenes/{scene_id}/instances`
316
+ Returns persisted instances across every object in the scene. Query params:
317
+ - `limit` (1–5000, default 1000)
318
+ - `status` (optional, repeatable) — only include instances whose `status` matches one of the provided values.
319
+ - `exclude_status` (optional, repeatable) — omit instances whose `status` matches any of the provided values (e.g., `exclude_status=superseded` to skip reassigned instances).
320
+
321
+ Response mirrors the storage map:
322
+
323
+ ```json
324
+ {
325
+ "scene_id": "scene-123",
326
+ "count": 5,
327
+ "objects": {
328
+ "sofa-1": [{...}, {...}],
329
+ "lamp-4": [{...}]
330
+ }
331
+ }
332
+ ```
333
+
334
+ #### `GET /scenes/{scene_id}/objects/{obj_id}/instances`
335
+ Returns the persisted instance rows for the object. Query params:
336
+ - `limit` (1–1000, default 100)
337
+ - `status` (optional, repeatable) — only include instances whose `status` matches one of the provided values.
338
+ - `exclude_status` (optional, repeatable) — omit instances whose `status` matches any of the provided values.
339
+
340
+ Example request skipping superseded items:
341
+
342
+ ```bash
343
+ curl "https://scene-graph-mgr-api.fly.dev/scenes/scene-123/objects/sofa-1/instances?limit=20&exclude_status=superseded"
344
+ ```
345
+
346
+ Response:
347
+
348
+ ```json
349
+ {
350
+ "scene_id": "scene-123",
351
+ "obj_id": "sofa-1",
352
+ "count": 2,
353
+ "instances": [
354
+ {
355
+ "id": "inst-456",
356
+ "image_id": "scenes/...",
357
+ "bbox_xyxy": [0.1,0.3,0.6,0.9],
358
+ "captured_at": 1728401000,
359
+ "status": "processed"
360
+ },
361
+ {
362
+ "id": "inst-123",
363
+ "image_id": "scenes/...",
364
+ "bbox_xyxy": [0.05,0.2,0.4,0.7],
365
+ "status": "pending"
366
+ }
367
+ ]
368
+ }
369
+ ```
370
+
371
+ #### `GET /scenes/{scene_id}/runlogs`
372
+ Query params:
373
+ - `limit` (1–1000, default 100)
374
+ - `obj_id` (optional filter)
375
+ - `instance_id` (optional filter)
376
+
377
+ Response contains `records` newest-first with `step`, `message`, `data`, timestamps, and run IDs.
378
+
379
+ #### `GET /scenes/{scene_id}/change-requests`
380
+ Inspect the change request queue. Query params:
381
+ - `state` (optional: `pending`, `applied`, `stale`)
382
+ - `limit` (1–200, default 50)
383
+
384
+ Returns an array mirroring the DB record with preconditions, payload, result, and the new `applied_summary` field:
385
+
386
+ ```json
387
+ [
388
+ {
389
+ "request_id": "95e8...",
390
+ "scene_id": "scene-123",
391
+ "obj_id": "table-1",
392
+ "requested_by": "belief-agent",
393
+ "state": "applied",
394
+ "confidence": 0.92,
395
+ "payload": {"operations": [...]},
396
+ "result": {"summary": "Renamed object \"wooden table\" from \"table\" to \"dining table\""},
397
+ "applied_at": "2024-11-19T20:14:32.123Z",
398
+ "applied_summary": "Renamed object \"wooden table\" from \"table\" to \"dining table\""
399
+ }
400
+ ]
401
+ ```
402
+
403
+ #### `GET /scenes/{scene_id}/change-summaries`
404
+ Lightweight feed of applied change blurbs, ordered by `applied_at` descending. Supports `limit` (1–200, default 50).
405
+
406
+ ```bash
407
+ curl "https://scene-graph-mgr-api.fly.dev/scenes/scene-123/change-summaries?limit=20"
408
+ ```
409
+
410
+ ```json
411
+ [
412
+ {
413
+ "request_id": "95e8...",
414
+ "scene_id": "scene-123",
415
+ "obj_id": "table-1",
416
+ "applied_version": 1729300042000,
417
+ "applied_at": "2024-11-19T20:14:32.123Z",
418
+ "summary": "Updated object \"wooden table\" attribute \"size\" from \"medium\" to \"large\""
419
+ },
420
+ {
421
+ "request_id": "73f1...",
422
+ "scene_id": "scene-123",
423
+ "summary": "Renamed object \"wooden table\" from \"table\" to \"dining table\""
424
+ }
425
+ ]
426
+ ```
427
+
428
+ Use this endpoint for ambient “stream of consciousness” UIs showing how the improver evolves the scene.
429
+
430
+ ### Diffs
431
+
432
+ #### `GET /scenes/{scene_id}/diff`
433
+ Compare two versions (`from_version`, `to_version`). Optional `mode` query (`patch`, `summary`, `both`). Returns machine patch plus stats when requested.
434
+
435
+ #### `GET /scenes/{scene_id}/diff-semantic`
436
+ ID-aware diff returning ordered operations (append/remove/replace) with summary stats.
437
+
438
+ ### Stream3R Reconstruction Jobs
439
+
440
+ The Stream3R API wraps the reconstruction workers (pose pointmaps and full models) and provides idempotent enqueue, polling, and event streaming.
441
+
442
+ #### `POST /stream3r/jobs`
443
+ Submit a reconstruction job. Supported `job_type` values are `pose_pointmap` and `model_build`. Provide at least one frame (`url` or `path`) and optional `client_request_id` for idempotency (subsequent calls reuse the existing job and return `200 OK`).
444
+
445
+ ```bash
446
+ curl -X POST https://scene-graph-mgr-api.fly.dev/stream3r/jobs \
447
+ -H "Content-Type: application/json" \
448
+ -d '{
449
+ "job_type": "pose_pointmap",
450
+ "scene_id": "scene-123",
451
+ "frames": [
452
+ {"url": "https://cdn.example/scene-123/frame_0000.webp"},
453
+ {"path": "/mnt/captures/scene-123/frame_0001.png"}
454
+ ],
455
+ "session_settings": {"prediction_mode": "pointmap"},
456
+ "client_request_id": "scene-123-20241008-run1"
457
+ }'
458
+ ```
459
+
460
+ Response (`202 Accepted` on first submission):
461
+
462
+ ```json
463
+ {
464
+ "job_id": "d8f8a3fc-3aed-441c-ac78-2b953a9229bf",
465
+ "job_type": "pose_pointmap",
466
+ "scene_id": "scene-123",
467
+ "status": "queued",
468
+ "created_at": "2024-10-08T14:12:05.417Z",
469
+ "payload": {
470
+ "job_type": "pose_pointmap",
471
+ "scene_id": "scene-123",
472
+ "frames": [
473
+ {"url": "https://cdn.example/scene-123/frame_0000.webp"},
474
+ {"path": "/mnt/captures/scene-123/frame_0001.png"}
475
+ ],
476
+ "session_settings": {"prediction_mode": "pointmap"}
477
+ }
478
+ }
479
+ ```
480
+
481
+ #### `GET /stream3r/jobs/{job_id}`
482
+ Fetch the canonical job record (backed by Postgres). Fields include:
483
+ - `status`: `queued`, `started`, `progress`, `finished`, or `failed`
484
+ - `result`: worker-published artifact manifest (S3 URLs, local paths)
485
+ - `error`: error string when status is `failed`
486
+ - timestamps (`created_at`, `started_at`, `completed_at`)
487
+
488
+ Typical successful response:
489
+
490
+ ```json
491
+ {
492
+ "job_id": "d8f8a3fc-3aed-441c-ac78-2b953a9229bf",
493
+ "job_type": "model_build",
494
+ "scene_id": "scene-123",
495
+ "status": "finished",
496
+ "created_at": "2024-10-08T14:12:05.417Z",
497
+ "started_at": "2024-10-08T14:12:12.998Z",
498
+ "completed_at": "2024-10-08T14:24:39.221Z",
499
+ "result": {
500
+ "model_dir": "s3://bucket/scene-123/stream3r/models/20241008",
501
+ "summary_url": "s3://bucket/scene-123/stream3r/models/20241008/summary.json"
502
+ },
503
+ "error": null,
504
+ "client_request_id": "scene-123-20241008-run1"
505
+ }
506
+ ```
507
+
508
+ #### `GET /stream3r/jobs`
509
+ List jobs with optional filters:
510
+ - `scene_id`
511
+ - `job_type`
512
+ - `status`
513
+ - `limit` (1–200, default 50)
514
+ - `offset` (default 0)
515
+
516
+ Returns `{ "jobs": [...], "limit": 50, "offset": 0 }` with the same schema as the single-job response.
517
+
518
+ #### `GET /stream3r/jobs/{job_id}/events`
519
+ Server-Sent Events feed backed by Redis Streams. Use it for near-real-time updates in browser dashboards.
520
+
521
+ ```bash
522
+ curl --no-buffer \
523
+ -H "Accept: text/event-stream" \
524
+ "https://scene-graph-mgr-api.fly.dev/stream3r/jobs/d8f8a3fc-3aed-441c-ac78-2b953a9229bf/events"
525
+ ```
526
+
527
+ Events are emitted as standard SSE payloads:
528
+
529
+ ```
530
+ id: 1728409930123-0
531
+ event: progress
532
+ data: {"job_id":"d8f8a3fc-3aed-441c-ac78-2b953a9229bf","status":"progress","progress":65}
533
+
534
+ id: 1728409960456-0
535
+ event: finished
536
+ data: {"job_id":"d8f8a3fc-3aed-441c-ac78-2b953a9229bf","status":"finished","result_url":"s3://bucket/.../summary.json"}
537
+ ```
538
+
539
+ Reconnect with the last `id` to resume (`?last_id=<redis-stream-id>`). When the worker encounters an error the stream emits `event: failed` with an `error` field.
540
+
541
+ > **Implementation note:** the current worker stub still returns `status="failed"` with `error="stream3r worker not implemented"` until the GPU-backed handler ships. Downstream clients should surface the error text to operators and may retry later.
542
+
543
+ #### `GET /stream3r/models/{scene_id}/presign`
544
+ Fetch a presigned download URL for the most recent `model_build` job's `scene.glb`. Optional query params:
545
+ - `job_id` — force a specific job (must belong to the scene).
546
+ - `expires` — TTL in seconds (default 900, range 60–86400).
547
+
548
+ ```bash
549
+ curl "https://scene-graph-mgr-api.fly.dev/stream3r/models/scene-123/presign?expires=600"
550
+ ```
551
+
552
+ Response:
553
+
554
+ ```json
555
+ {
556
+ "scene_id": "scene-123",
557
+ "job_id": "d8f8a3fc-3aed-441c-ac78-2b953a9229bf",
558
+ "key": "scenes/scene-123/stream3r/models/20241008/scene.glb",
559
+ "url": "https://s3.amazonaws.com/...signature...",
560
+ "expires_in": 600
561
+ }
562
+ ```
563
+
564
+ Returns `404` if no successful model build exists or the job did not publish a GLB artifact.
565
+
566
+ ### Jobs & Internal Ops
567
+
568
+ #### `GET /jobs/{job_id}`
569
+ Inspect RQ job status. Returns timestamps, result payload (if finished), and truncated stack trace when failed.
570
+
571
+ #### `POST /internal/commit-version`
572
+ Worker-only commit. Body:
573
+ ```json
574
+ {
575
+ "scene_location_id": "scene-123",
576
+ "scene_graph": {...},
577
+ "base_version": 1728400000000,
578
+ "meta": {"source": "worker"}
579
+ }
580
+ ```
581
+ Requires `x-internal-secret` header when enabled. Creates new version, broadcasts `scene.update`, and publishes on Redis pub/sub.
582
+
583
+ ### WebSocket Stream
584
+
585
+ #### `WS /ws?channel=all|{scene_id}`
586
+ Receives JSON events when scenes change (`scene.create`, `scene.put`, `scene.patch`, `scene.update`). Use to refresh UI state in real time.
587
+
588
+ ---
589
+
590
+ ## Notes & Best Practices
591
+
592
+ - All timestamps are UTC ISO strings.
593
+ - Scene writes (`PUT`, `PATCH`, `create`, `commit-version`) broadcast on WebSocket and publish to Redis channel `scene_events`.
594
+ - Object storage paths follow `scenes/{scene_id}/images/...`; presign endpoint enforces this prefix.
595
+ - Scene graph payloads no longer embed instance blobs; use the `/instances` endpoint for that data.
596
+ - Postgres storage is required; filesystem fallbacks have been removed from the API/worker flows.
597
+ - Queue jobs default to 15-minute timeout (image) or 30-minute (batch). Track job progress via `/jobs/{job_id}` or Redis CLI.
598
+ - Improver run logs are persisted to Postgres (if configured) and mirrored to JSONL under `SCENE_RUN_LOG_DIR`.
notes.md CHANGED
@@ -2,3 +2,5 @@
2
 
3
  **Manually Clear the GPU Lock from REDIS:
4
  redis-cli -u "$REDIS_URL" DEL gpu:lock
 
 
 
2
 
3
  **Manually Clear the GPU Lock from REDIS:
4
  redis-cli -u "$REDIS_URL" DEL gpu:lock
5
+
6
+ export INTERNAL_NOTIFY_SECRET='82d4acd547e449fe';
stream3r/utils/__pycache__/visual_utils.cpython-311.pyc CHANGED
Binary files a/stream3r/utils/__pycache__/visual_utils.cpython-311.pyc and b/stream3r/utils/__pycache__/visual_utils.cpython-311.pyc differ
 
stream3r/utils/visual_utils.py CHANGED
@@ -327,6 +327,7 @@ def predictions_to_glb(
327
  reinflate_seed: int | None = None,
328
  ceiling_percentile: float | None = None,
329
  ceiling_margin: float = 0.05,
 
330
  ) -> trimesh.Scene:
331
  """
332
  Converts predictions to a 3D scene represented as a GLB file.
@@ -364,6 +365,7 @@ def predictions_to_glb(
364
  reinflate_seed (Optional[int]): RNG seed for deterministic reinflation.
365
  ceiling_percentile (Optional[float]): Remove points above this Z percentile (0-100).
366
  ceiling_margin (float): Margin subtracted from percentile cutoff (meters).
 
367
 
368
  Returns:
369
  trimesh.Scene: Processed 3D scene containing point cloud and cameras
@@ -544,6 +546,20 @@ def predictions_to_glb(
544
  colors_rgb = colors_rgb[keep_mask]
545
  conf_used = conf_used[keep_mask]
546
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
547
  if effective_voxel_size is not None and voxel_after_conf and vertices_3d.size:
548
  before_count = vertices_3d.shape[0]
549
  vertices_3d, colors_rgb, conf_used = voxel_reduce(
 
327
  reinflate_seed: int | None = None,
328
  ceiling_percentile: float | None = None,
329
  ceiling_margin: float = 0.05,
330
+ ceiling_z_max: float | None = None,
331
  ) -> trimesh.Scene:
332
  """
333
  Converts predictions to a 3D scene represented as a GLB file.
 
365
  reinflate_seed (Optional[int]): RNG seed for deterministic reinflation.
366
  ceiling_percentile (Optional[float]): Remove points above this Z percentile (0-100).
367
  ceiling_margin (float): Margin subtracted from percentile cutoff (meters).
368
+ ceiling_z_max (Optional[float]): Remove points with Z >= this absolute height (meters).
369
 
370
  Returns:
371
  trimesh.Scene: Processed 3D scene containing point cloud and cameras
 
546
  colors_rgb = colors_rgb[keep_mask]
547
  conf_used = conf_used[keep_mask]
548
 
549
+ if ceiling_z_max is not None and vertices_3d.size:
550
+ try:
551
+ z_limit = float(ceiling_z_max)
552
+ except (TypeError, ValueError):
553
+ z_limit = None
554
+ if z_limit is not None:
555
+ keep_mask = vertices_3d[:, 2] < z_limit
556
+ if not np.any(keep_mask):
557
+ keep_mask = vertices_3d[:, 2] <= z_limit
558
+ if np.any(keep_mask) and np.count_nonzero(keep_mask) < vertices_3d.shape[0]:
559
+ vertices_3d = vertices_3d[keep_mask]
560
+ colors_rgb = colors_rgb[keep_mask]
561
+ conf_used = conf_used[keep_mask]
562
+
563
  if effective_voxel_size is not None and voxel_after_conf and vertices_3d.size:
564
  before_count = vertices_3d.shape[0]
565
  vertices_3d, colors_rgb, conf_used = voxel_reduce(
stream3r/worker/__init__.py CHANGED
@@ -8,9 +8,10 @@ _settings = WorkerSettings.from_env()
8
  if _settings.default_job_timeout and _settings.default_job_timeout > 0:
9
  Queue.DEFAULT_TIMEOUT = _settings.default_job_timeout
10
 
11
- from .tasks import model_build_job, pose_pointmap_job # noqa: E402
12
 
13
  __all__ = [
14
  "pose_pointmap_job",
15
  "model_build_job",
 
16
  ]
 
8
  if _settings.default_job_timeout and _settings.default_job_timeout > 0:
9
  Queue.DEFAULT_TIMEOUT = _settings.default_job_timeout
10
 
11
+ from .tasks import keyframe_selection_job, model_build_job, pose_pointmap_job # noqa: E402
12
 
13
  __all__ = [
14
  "pose_pointmap_job",
15
  "model_build_job",
16
+ "keyframe_selection_job",
17
  ]
stream3r/worker/config.py CHANGED
@@ -68,6 +68,7 @@ class WorkerSettings:
68
 
69
  pose_queue: str = "pose_pointmap"
70
  model_queue: str = "model_build"
 
71
 
72
  gpu_lock_key: str = "gpu:lock"
73
  gpu_lock_timeout: int = 3600
@@ -113,6 +114,7 @@ class WorkerSettings:
113
 
114
  scene_media_api_base_url: str | None = None
115
  scene_media_api_token: str | None = None
 
116
  scene_media_page_size: int = 200
117
  stream_window_size: int = 14
118
  max_frames_per_job: int = 0
@@ -128,7 +130,10 @@ class WorkerSettings:
128
  keyframe_coverage_voxel_size: float = 0.05
129
  keyframe_coverage_max_points: int = 5000
130
  keyframe_min_gain_ratio: float = 0.01
131
- keyframe_full_mode_max_frames: int = 16
 
 
 
132
 
133
  @classmethod
134
  def from_env(cls) -> "WorkerSettings":
@@ -143,6 +148,7 @@ class WorkerSettings:
143
  ),
144
  "pose_queue": os.getenv("STREAM3R_QUEUE_POSE", base.pose_queue),
145
  "model_queue": os.getenv("STREAM3R_QUEUE_MODEL", base.model_queue),
 
146
  "gpu_lock_key": os.getenv("STREAM3R_GPU_LOCK_KEY", base.gpu_lock_key),
147
  "gpu_lock_timeout": _env_int("STREAM3R_GPU_LOCK_TIMEOUT", base.gpu_lock_timeout),
148
  "gpu_lock_blocking_timeout": _env_int(
@@ -212,6 +218,13 @@ class WorkerSettings:
212
  default=base.scene_media_api_token,
213
  )
214
  or None,
 
 
 
 
 
 
 
215
  "scene_media_page_size": _env_int(
216
  "STREAM3R_MEDIA_PAGE_SIZE", base.scene_media_page_size
217
  ),
@@ -260,6 +273,15 @@ class WorkerSettings:
260
  "keyframe_full_mode_max_frames": _env_int(
261
  "STREAM3R_KEYFRAME_FULL_MAX_FRAMES", base.keyframe_full_mode_max_frames
262
  ),
 
 
 
 
 
 
 
 
 
263
  }
264
 
265
  return cls(**kwargs)
 
68
 
69
  pose_queue: str = "pose_pointmap"
70
  model_queue: str = "model_build"
71
+ keyframe_queue: str = "keyframe_selection"
72
 
73
  gpu_lock_key: str = "gpu:lock"
74
  gpu_lock_timeout: int = 3600
 
114
 
115
  scene_media_api_base_url: str | None = None
116
  scene_media_api_token: str | None = None
117
+ scene_media_api_secret: str | None = None
118
  scene_media_page_size: int = 200
119
  stream_window_size: int = 14
120
  max_frames_per_job: int = 0
 
130
  keyframe_coverage_voxel_size: float = 0.05
131
  keyframe_coverage_max_points: int = 5000
132
  keyframe_min_gain_ratio: float = 0.01
133
+ keyframe_full_mode_max_frames: int = 24
134
+ keyframe_extract_fps: float = 6.0
135
+ keyframe_extract_max_frames: int = 1200
136
+ keyframe_upload_dir: str = "images/keyframes"
137
 
138
  @classmethod
139
  def from_env(cls) -> "WorkerSettings":
 
148
  ),
149
  "pose_queue": os.getenv("STREAM3R_QUEUE_POSE", base.pose_queue),
150
  "model_queue": os.getenv("STREAM3R_QUEUE_MODEL", base.model_queue),
151
+ "keyframe_queue": os.getenv("STREAM3R_QUEUE_KEYFRAME", base.keyframe_queue),
152
  "gpu_lock_key": os.getenv("STREAM3R_GPU_LOCK_KEY", base.gpu_lock_key),
153
  "gpu_lock_timeout": _env_int("STREAM3R_GPU_LOCK_TIMEOUT", base.gpu_lock_timeout),
154
  "gpu_lock_blocking_timeout": _env_int(
 
218
  default=base.scene_media_api_token,
219
  )
220
  or None,
221
+ "scene_media_api_secret": _env_value(
222
+ "STREAM3R_MEDIA_API_SECRET",
223
+ "MEDIA_API_SECRET",
224
+ "INTERNAL_NOTIFY_SECRET",
225
+ default=base.scene_media_api_secret,
226
+ )
227
+ or None,
228
  "scene_media_page_size": _env_int(
229
  "STREAM3R_MEDIA_PAGE_SIZE", base.scene_media_page_size
230
  ),
 
273
  "keyframe_full_mode_max_frames": _env_int(
274
  "STREAM3R_KEYFRAME_FULL_MAX_FRAMES", base.keyframe_full_mode_max_frames
275
  ),
276
+ "keyframe_extract_fps": float(
277
+ os.getenv("STREAM3R_KEYFRAME_EXTRACT_FPS", base.keyframe_extract_fps)
278
+ ),
279
+ "keyframe_extract_max_frames": _env_int(
280
+ "STREAM3R_KEYFRAME_EXTRACT_MAX_FRAMES", base.keyframe_extract_max_frames
281
+ ),
282
+ "keyframe_upload_dir": os.getenv(
283
+ "STREAM3R_KEYFRAME_UPLOAD_DIR", base.keyframe_upload_dir
284
+ ),
285
  }
286
 
287
  return cls(**kwargs)
stream3r/worker/keyframes.py ADDED
@@ -0,0 +1,408 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Key frame selection utilities."""
2
+
3
+ from __future__ import annotations
4
+
5
+ import logging
6
+ from dataclasses import dataclass, field
7
+ from datetime import datetime
8
+ from pathlib import Path
9
+ from typing import Any, Iterable, Mapping
10
+
11
+ import cv2
12
+ import numpy as np
13
+
14
+ from .config import WorkerSettings
15
+ from .pipeline import run_stream3r_inference
16
+ from .runtime import WorkerRuntime
17
+
18
+
19
+ logger = logging.getLogger(__name__)
20
+
21
+
22
+ @dataclass(slots=True)
23
+ class FrameRecord:
24
+ index: int
25
+ frame_id: str
26
+ path: Path
27
+ source: str | None = None
28
+ timestamp: str | None = None
29
+ metadata: dict[str, Any] = field(default_factory=dict)
30
+
31
+
32
+ @dataclass(slots=True)
33
+ class KeyframeSelectionResult:
34
+ indices: list[int]
35
+ diagnostics: list[dict[str, Any]]
36
+ top_k: int
37
+
38
+
39
+ def pose_confidence(predictions: Mapping[str, np.ndarray]) -> np.ndarray | None:
40
+ if "world_points_conf" in predictions:
41
+ return np.asarray(predictions["world_points_conf"], dtype=np.float32)
42
+ if "depth_conf" in predictions:
43
+ return np.asarray(predictions["depth_conf"], dtype=np.float32)
44
+ return None
45
+
46
+
47
+ def _camera_poses(extrinsic: np.ndarray) -> tuple[np.ndarray, np.ndarray]:
48
+ matrices = np.asarray(extrinsic, dtype=np.float64)
49
+ if matrices.ndim != 3 or matrices.shape[1:] != (3, 4):
50
+ raise ValueError("Extrinsic array must have shape (N, 3, 4)")
51
+ count = matrices.shape[0]
52
+ rotations = np.empty((count, 3, 3), dtype=np.float64)
53
+ translations = np.empty((count, 3), dtype=np.float64)
54
+ for idx in range(count):
55
+ mat = np.eye(4, dtype=np.float64)
56
+ mat[:3, :4] = matrices[idx]
57
+ cam_to_world = np.linalg.inv(mat)
58
+ rotations[idx] = cam_to_world[:3, :3]
59
+ translations[idx] = cam_to_world[:3, 3]
60
+ return rotations, translations
61
+
62
+
63
+ def _compute_motion_deltas(rotations: np.ndarray, translations: np.ndarray, rot_weight: float) -> np.ndarray:
64
+ count = rotations.shape[0]
65
+ deltas = np.zeros(count, dtype=np.float64)
66
+ if count <= 1:
67
+ return deltas
68
+ for idx in range(1, count):
69
+ delta_t = np.linalg.norm(translations[idx] - translations[idx - 1])
70
+ rel = rotations[idx - 1].T @ rotations[idx]
71
+ trace = np.clip((np.trace(rel) - 1.0) / 2.0, -1.0, 1.0)
72
+ delta_r = float(np.arccos(trace))
73
+ deltas[idx] = delta_t + rot_weight * delta_r
74
+ return deltas
75
+
76
+
77
+ def _hash_quantized_voxels(coords: np.ndarray) -> np.ndarray:
78
+ coords = coords.astype(np.int64, copy=False)
79
+ primes = np.array([73856093, 19349663, 83492791], dtype=np.int64)
80
+ return coords @ primes
81
+
82
+
83
+ def _frame_voxel_sets(
84
+ world_points: np.ndarray,
85
+ confidence: np.ndarray,
86
+ *,
87
+ threshold: float,
88
+ voxel_size: float,
89
+ max_points: int,
90
+ ) -> tuple[list[set[int]], int]:
91
+ rng = np.random.default_rng(42)
92
+ frames = world_points.shape[0]
93
+ voxel_sets: list[set[int]] = []
94
+ global_union: set[int] = set()
95
+ if voxel_size <= 0.0:
96
+ return [set() for _ in range(frames)], 0
97
+ for idx in range(frames):
98
+ conf_frame = confidence[idx]
99
+ mask = conf_frame >= threshold
100
+ if not np.any(mask):
101
+ voxel_sets.append(set())
102
+ continue
103
+ points = world_points[idx][mask]
104
+ if points.shape[0] > max_points:
105
+ sample_idx = rng.choice(points.shape[0], max_points, replace=False)
106
+ points = points[sample_idx]
107
+ quantized = np.floor(points / voxel_size).astype(np.int64, copy=False)
108
+ hashes = np.unique(_hash_quantized_voxels(quantized))
109
+ voxel_set = set(int(v) for v in hashes.tolist())
110
+ voxel_sets.append(voxel_set)
111
+ global_union.update(voxel_set)
112
+ return voxel_sets, len(global_union)
113
+
114
+
115
+ def _select_motion_indices(
116
+ motion_deltas: np.ndarray,
117
+ *,
118
+ threshold: float,
119
+ min_gap: int,
120
+ max_gap: int,
121
+ ) -> tuple[list[int], dict[int, dict[str, float]]]:
122
+ total_frames = motion_deltas.shape[0]
123
+ if total_frames == 0:
124
+ return [], {}
125
+ selected = [0]
126
+ diagnostics: dict[int, dict[str, float]] = {0: {"motion_delta": 0.0, "cum_motion": 0.0}}
127
+ cumulative = 0.0
128
+ gap = 0
129
+ for idx in range(1, total_frames):
130
+ delta = float(motion_deltas[idx])
131
+ cumulative += delta
132
+ gap += 1
133
+ if gap < max(1, min_gap):
134
+ continue
135
+ should_select = cumulative >= threshold
136
+ if max_gap > 0 and gap >= max_gap:
137
+ should_select = True
138
+ if should_select:
139
+ selected.append(idx)
140
+ diagnostics[idx] = {"motion_delta": delta, "cum_motion": cumulative}
141
+ cumulative = 0.0
142
+ gap = 0
143
+ if selected[-1] != total_frames - 1:
144
+ selected.append(total_frames - 1)
145
+ diagnostics.setdefault(total_frames - 1, {"motion_delta": float(motion_deltas[-1]), "cum_motion": cumulative})
146
+ return selected, diagnostics
147
+
148
+
149
+ def select_keyframes_motion_coverage(
150
+ frame_records: list[FrameRecord],
151
+ predictions: Mapping[str, np.ndarray],
152
+ settings: WorkerSettings,
153
+ requested_top_k: int,
154
+ ) -> KeyframeSelectionResult | None:
155
+ extrinsic = np.asarray(predictions.get("extrinsic"))
156
+ if extrinsic.size == 0:
157
+ return None
158
+ rotations, translations = _camera_poses(extrinsic)
159
+ motion_deltas = _compute_motion_deltas(rotations, translations, settings.keyframe_rotation_weight)
160
+ motion_indices, motion_diag = _select_motion_indices(
161
+ motion_deltas,
162
+ threshold=settings.keyframe_motion_threshold,
163
+ min_gap=max(1, settings.keyframe_min_gap_frames),
164
+ max_gap=max(0, settings.keyframe_max_gap_frames),
165
+ )
166
+
167
+ total_frames = len(frame_records)
168
+ confidence = pose_confidence(predictions)
169
+ world_points = predictions.get("world_points")
170
+ if world_points is None:
171
+ world_points = predictions.get("world_points_from_depth")
172
+
173
+ voxel_sets: list[set[int]] = [set() for _ in range(total_frames)]
174
+ total_voxels = 0
175
+ mean_conf = np.zeros(total_frames, dtype=np.float32)
176
+ if confidence is not None:
177
+ mean_conf = confidence.reshape(confidence.shape[0], -1).mean(axis=1)
178
+
179
+ if confidence is not None and world_points is not None:
180
+ voxel_sets, total_voxels = _frame_voxel_sets(
181
+ np.asarray(world_points),
182
+ np.asarray(confidence),
183
+ threshold=settings.keyframe_coverage_confidence,
184
+ voxel_size=settings.keyframe_coverage_voxel_size,
185
+ max_points=max(1000, settings.keyframe_coverage_max_points),
186
+ )
187
+
188
+ total_voxels = max(total_voxels, 1)
189
+ top_k = requested_top_k if requested_top_k > 0 else settings.keyframe_default_top_k
190
+ top_k = max(min(top_k, total_frames), len(motion_indices))
191
+
192
+ selected_set: set[int] = set(motion_indices)
193
+ diagnostics: dict[int, dict[str, Any]] = {}
194
+ covered: set[int] = set()
195
+
196
+ for idx in motion_indices:
197
+ gain_count = len(voxel_sets[idx] - covered) if voxel_sets[idx] else 0
198
+ gain_ratio = gain_count / total_voxels
199
+ covered.update(voxel_sets[idx])
200
+ diagnostics[idx] = {
201
+ "frame_id": frame_records[idx].frame_id,
202
+ "frame_index": frame_records[idx].index,
203
+ "reason": "motion",
204
+ "motion_delta": float(motion_deltas[idx]),
205
+ "cum_motion": float(motion_diag.get(idx, {}).get("cum_motion", 0.0)),
206
+ "coverage_gain_ratio": float(gain_ratio),
207
+ "coverage_gain_count": int(gain_count),
208
+ "mean_confidence": float(mean_conf[idx]) if confidence is not None else None,
209
+ }
210
+
211
+ if len(selected_set) < top_k and total_voxels > 0:
212
+ min_gain_ratio = settings.keyframe_min_gain_ratio
213
+ remaining = [i for i in range(total_frames) if i not in selected_set and voxel_sets[i]]
214
+ while remaining and len(selected_set) < top_k:
215
+ best_idx = -1
216
+ best_gain = -1
217
+ best_ratio = -1.0
218
+ for idx in remaining:
219
+ gain = len(voxel_sets[idx] - covered)
220
+ if gain <= 0:
221
+ continue
222
+ ratio = gain / total_voxels
223
+ if ratio > best_ratio or (np.isclose(ratio, best_ratio) and gain > best_gain):
224
+ best_idx = idx
225
+ best_gain = gain
226
+ best_ratio = ratio
227
+ if best_idx == -1 or best_ratio < min_gain_ratio:
228
+ break
229
+ selected_set.add(best_idx)
230
+ covered.update(voxel_sets[best_idx])
231
+ diagnostics[best_idx] = {
232
+ "frame_id": frame_records[best_idx].frame_id,
233
+ "frame_index": frame_records[best_idx].index,
234
+ "reason": "coverage",
235
+ "motion_delta": float(motion_deltas[best_idx]),
236
+ "cum_motion": float(motion_diag.get(best_idx, {}).get("cum_motion", 0.0)),
237
+ "coverage_gain_ratio": float(best_ratio),
238
+ "coverage_gain_count": int(best_gain),
239
+ "mean_confidence": float(mean_conf[best_idx]) if confidence is not None else None,
240
+ }
241
+ remaining.remove(best_idx)
242
+
243
+ if requested_top_k > 0 and len(selected_set) > requested_top_k:
244
+ coverage_candidates = [idx for idx in selected_set if diagnostics[idx]["reason"] == "coverage"]
245
+ coverage_candidates.sort(key=lambda idx: diagnostics[idx].get("coverage_gain_ratio", 0.0))
246
+ while len(selected_set) > requested_top_k and coverage_candidates:
247
+ drop_idx = coverage_candidates.pop(0)
248
+ selected_set.remove(drop_idx)
249
+ diagnostics.pop(drop_idx, None)
250
+
251
+ final_indices = sorted(selected_set)
252
+ final_diags = [diagnostics[idx] for idx in final_indices]
253
+ return KeyframeSelectionResult(indices=final_indices, diagnostics=final_diags, top_k=len(final_indices))
254
+
255
+
256
+ def run_keyframe_prepass(
257
+ *,
258
+ runtime: WorkerRuntime,
259
+ payload: Mapping[str, Any],
260
+ frame_records: list[FrameRecord],
261
+ mode: str,
262
+ streaming: bool,
263
+ window_size: int | None,
264
+ ) -> KeyframeSelectionResult | None:
265
+ if len(frame_records) <= 1:
266
+ return None
267
+
268
+ settings = runtime.settings
269
+ top_k_payload = int(payload.get("prepass_top_k") or payload.get("top_k_frames") or payload.get("top_k") or 0)
270
+
271
+ try:
272
+ inference = run_stream3r_inference(
273
+ runtime=runtime,
274
+ image_paths=[record.path for record in frame_records],
275
+ mode=mode,
276
+ streaming=streaming,
277
+ cache_output_path=None,
278
+ progress_cb=None,
279
+ window_size=window_size if streaming and mode == "window" else None,
280
+ )
281
+ except Exception:
282
+ logger.exception("Keyframe pre-pass inference failed")
283
+ return None
284
+
285
+ try:
286
+ return select_keyframes_motion_coverage(
287
+ frame_records,
288
+ inference.predictions,
289
+ settings,
290
+ requested_top_k=top_k_payload,
291
+ )
292
+ finally:
293
+ del inference
294
+
295
+
296
+ def extract_video_frames(
297
+ video_path: Path,
298
+ output_dir: Path,
299
+ *,
300
+ target_fps: float | None = None,
301
+ max_frames: int | None = None,
302
+ ) -> tuple[list[FrameRecord], float]:
303
+ if not video_path.exists():
304
+ raise FileNotFoundError(f"Video file not found: {video_path}")
305
+
306
+ output_dir.mkdir(parents=True, exist_ok=True)
307
+ cap = cv2.VideoCapture(str(video_path))
308
+ if not cap.isOpened():
309
+ raise RuntimeError(f"Failed to open video: {video_path}")
310
+
311
+ native_fps = cap.get(cv2.CAP_PROP_FPS)
312
+ if not native_fps or native_fps <= 0:
313
+ native_fps = 30.0
314
+ frame_interval = 1
315
+ if target_fps and target_fps > 0:
316
+ frame_interval = max(1, int(round(native_fps / target_fps)))
317
+
318
+ frame_records: list[FrameRecord] = []
319
+ total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT) or 0)
320
+ frame_idx = 0
321
+ extracted = 0
322
+ success, frame = cap.read()
323
+ while success:
324
+ if frame_idx % frame_interval == 0:
325
+ frame_id = f"frame_{extracted:06d}"
326
+ frame_path = output_dir / f"{frame_id}.jpg"
327
+ if not cv2.imwrite(str(frame_path), frame):
328
+ cap.release()
329
+ raise RuntimeError(f"Failed to write frame: {frame_path}")
330
+ timestamp_s = frame_idx / native_fps
331
+ frame_records.append(
332
+ FrameRecord(
333
+ index=extracted,
334
+ frame_id=frame_id,
335
+ path=frame_path,
336
+ metadata={"frame_number": frame_idx, "timestamp_s": timestamp_s},
337
+ )
338
+ )
339
+ extracted += 1
340
+ if max_frames and extracted >= max_frames:
341
+ break
342
+ frame_idx += 1
343
+ success, frame = cap.read()
344
+
345
+ cap.release()
346
+ if not frame_records:
347
+ raise RuntimeError("No frames extracted from video")
348
+
349
+ return frame_records, native_fps
350
+
351
+
352
+ def linear_sample_indices(total: int, desired: int) -> list[int]:
353
+ if desired <= 0 or total <= desired:
354
+ return list(range(total))
355
+ step = total / desired
356
+ return [min(total - 1, int(round(i * step))) for i in range(desired)]
357
+
358
+
359
+ def build_keyframe_uploads(
360
+ runtime: WorkerRuntime,
361
+ scene_id: str,
362
+ selected_records: Iterable[FrameRecord],
363
+ diagnostics: list[dict[str, Any]],
364
+ *,
365
+ subdir: str,
366
+ ) -> tuple[list[dict[str, Any]], list[dict[str, Any]]]:
367
+ diag_by_index = {entry.get("frame_index"): entry for entry in diagnostics}
368
+ storage_entries: list[dict[str, Any]] = []
369
+ media_entries: list[dict[str, Any]] = []
370
+
371
+ for record in selected_records:
372
+ diag = diag_by_index.get(record.index, {})
373
+ filename = f"{record.frame_id}.jpg"
374
+ key = runtime.storage.build_key(scene_id, subdir, filename)
375
+ uri = runtime.storage.upload_file(record.path, key, content_type="image/jpeg")
376
+ storage_entries.append(
377
+ {
378
+ "frame_id": record.frame_id,
379
+ "frame_index": record.index,
380
+ "url": uri,
381
+ "storage_key": key,
382
+ "diagnostics": diag,
383
+ }
384
+ )
385
+
386
+ media_entries.append(
387
+ {
388
+ "media_type": "image",
389
+ "file": key,
390
+ "captured_at": _diagnostic_captured_at(record, diag),
391
+ }
392
+ )
393
+
394
+ return storage_entries, media_entries
395
+
396
+
397
+ def _diagnostic_captured_at(record: FrameRecord, diag: Mapping[str, Any]) -> str | None:
398
+ if record.timestamp:
399
+ return record.timestamp
400
+ ts = diag.get("timestamp") or record.metadata.get("timestamp")
401
+ if isinstance(ts, str):
402
+ return ts
403
+ if isinstance(ts, (int, float)):
404
+ return datetime.utcfromtimestamp(float(ts)).isoformat() + "Z"
405
+ timestamp_s = record.metadata.get("timestamp_s")
406
+ if isinstance(timestamp_s, (int, float)):
407
+ return datetime.utcfromtimestamp(float(timestamp_s)).isoformat() + "Z"
408
+ return None
stream3r/worker/main.py CHANGED
@@ -77,7 +77,7 @@ def main() -> None:
77
  if settings.default_job_timeout and settings.default_job_timeout > 0:
78
  Queue.DEFAULT_TIMEOUT = settings.default_job_timeout
79
 
80
- args = _parse_args([settings.pose_queue, settings.model_queue])
81
  logging.basicConfig(level=getattr(logging, str(args.log_level).upper(), logging.INFO))
82
 
83
  runtime = get_runtime()
 
77
  if settings.default_job_timeout and settings.default_job_timeout > 0:
78
  Queue.DEFAULT_TIMEOUT = settings.default_job_timeout
79
 
80
+ args = _parse_args([settings.pose_queue, settings.model_queue, settings.keyframe_queue])
81
  logging.basicConfig(level=getattr(logging, str(args.log_level).upper(), logging.INFO))
82
 
83
  runtime = get_runtime()
stream3r/worker/tasks.py CHANGED
@@ -11,7 +11,6 @@ import shutil
11
  import tempfile
12
  import traceback
13
  import uuid
14
- from dataclasses import dataclass, field
15
  from datetime import datetime, timezone
16
  from pathlib import Path
17
  from contextlib import nullcontext
@@ -24,6 +23,15 @@ from rq import get_current_job
24
 
25
  from stream3r.utils.visual_utils import predictions_to_glb
26
 
 
 
 
 
 
 
 
 
 
27
  from .pipeline import InferenceResult, run_stream3r_inference
28
  from .runtime import WorkerRuntime, get_runtime
29
 
@@ -51,25 +59,6 @@ def _as_int(value: Any, default: int) -> int:
51
  return int(value)
52
  except (TypeError, ValueError):
53
  return default
54
-
55
-
56
- @dataclass(slots=True)
57
- class FrameRecord:
58
- index: int
59
- frame_id: str
60
- path: Path
61
- source: str | None = None
62
- timestamp: str | None = None
63
- metadata: dict[str, Any] = field(default_factory=dict)
64
-
65
-
66
- @dataclass(slots=True)
67
- class KeyframeSelectionResult:
68
- indices: list[int]
69
- diagnostics: list[dict[str, Any]]
70
- top_k: int
71
-
72
-
73
  class ProgressTracker:
74
  """Aggregates frame progress to percentage updates."""
75
 
@@ -124,6 +113,53 @@ def _write_base64(content: str, destination: Path) -> None:
124
  destination.write_bytes(data)
125
 
126
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
127
  def _resolve_frame_entry(entry: Any, *, index: int, dest_dir: Path) -> FrameRecord:
128
  metadata: dict[str, Any] = {}
129
  timestamp = None
@@ -364,16 +400,6 @@ def _collect_frames_from_scene_media(
364
  offset += request_limit
365
 
366
  return records
367
-
368
-
369
- def _pose_confidence(predictions: Mapping[str, np.ndarray]) -> np.ndarray | None:
370
- if "world_points_conf" in predictions:
371
- return np.asarray(predictions["world_points_conf"], dtype=np.float32)
372
- if "depth_conf" in predictions:
373
- return np.asarray(predictions["depth_conf"], dtype=np.float32)
374
- return None
375
-
376
-
377
  def _save_pointmaps(
378
  *,
379
  runtime: WorkerRuntime,
@@ -389,7 +415,7 @@ def _save_pointmaps(
389
  raise RuntimeError("Predictions missing world points")
390
 
391
  world_points = np.asarray(world_points)
392
- confidence = _pose_confidence(predictions)
393
  if confidence is None:
394
  confidence = np.ones(world_points.shape[:-1], dtype=np.float32)
395
 
@@ -669,7 +695,7 @@ def _select_keyframes_motion_coverage(
669
  max_gap=max(0, settings.keyframe_max_gap_frames),
670
  )
671
  total_frames = len(frame_records)
672
- confidence = _pose_confidence(predictions)
673
  world_points = predictions.get("world_points")
674
  if world_points is None:
675
  world_points = predictions.get("world_points_from_depth")
@@ -756,7 +782,7 @@ def _compute_selected_frames(
756
  ) -> list[dict[str, Any]]:
757
  if top_k <= 0:
758
  return []
759
- confidence = _pose_confidence(predictions)
760
  if confidence is None:
761
  return []
762
  scores = confidence.reshape(confidence.shape[0], -1).mean(axis=1)
@@ -831,6 +857,11 @@ def _save_scene_glb(
831
  ceiling_margin_value = float(ceiling_margin_value) if ceiling_margin_value is not None else 0.05
832
  except (TypeError, ValueError):
833
  ceiling_margin_value = 0.05
 
 
 
 
 
834
 
835
  scene = predictions_to_glb(
836
  dict(predictions),
@@ -844,6 +875,7 @@ def _save_scene_glb(
844
  prediction_mode=payload.get("prediction_mode", "Predicted Pointmap"),
845
  ceiling_percentile=ceiling_percentile_value,
846
  ceiling_margin=ceiling_margin_value,
 
847
  )
848
  scene.export(file_obj=str(local_file))
849
  key = runtime.storage.build_key(
@@ -1302,6 +1334,174 @@ def model_build_job(payload: Mapping[str, Any]) -> dict[str, Any]:
1302
  return _execute_job("model_build", payload, _handle_model_build)
1303
 
1304
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1305
  def _handle_model_build(
1306
  *,
1307
  runtime: WorkerRuntime,
 
11
  import tempfile
12
  import traceback
13
  import uuid
 
14
  from datetime import datetime, timezone
15
  from pathlib import Path
16
  from contextlib import nullcontext
 
23
 
24
  from stream3r.utils.visual_utils import predictions_to_glb
25
 
26
+ from .keyframes import (
27
+ FrameRecord,
28
+ KeyframeSelectionResult,
29
+ build_keyframe_uploads,
30
+ extract_video_frames,
31
+ linear_sample_indices,
32
+ pose_confidence,
33
+ run_keyframe_prepass,
34
+ )
35
  from .pipeline import InferenceResult, run_stream3r_inference
36
  from .runtime import WorkerRuntime, get_runtime
37
 
 
59
  return int(value)
60
  except (TypeError, ValueError):
61
  return default
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
62
  class ProgressTracker:
63
  """Aggregates frame progress to percentage updates."""
64
 
 
113
  destination.write_bytes(data)
114
 
115
 
116
+ def _register_scene_media_entries(runtime: WorkerRuntime, scene_id: str, entries: list[dict[str, Any]]) -> None:
117
+ if not entries:
118
+ return
119
+ base_url = runtime.settings.scene_media_api_base_url
120
+ if not base_url:
121
+ logger.info("Scene media API base URL not configured; skipping registration for %s", scene_id)
122
+ return
123
+
124
+ url = f"{base_url.rstrip('/')}/scenes/{scene_id}/media"
125
+ headers: dict[str, str] = {"Content-Type": "application/json"}
126
+ token = runtime.settings.scene_media_api_token
127
+ if token:
128
+ headers["Authorization"] = f"Bearer {token}"
129
+ secret = runtime.settings.scene_media_api_secret
130
+ if secret:
131
+ headers["x-internal-secret"] = secret
132
+
133
+ try:
134
+ response = requests.post(url, json={"entries": entries}, headers=headers, timeout=30)
135
+ if response.status_code == 405:
136
+ logger.info(
137
+ "Scene media API does not accept POST at %s (status %s); skipping registration",
138
+ url,
139
+ response.status_code,
140
+ )
141
+ return
142
+ response.raise_for_status()
143
+ except requests.HTTPError as exc:
144
+ status = exc.response.status_code if exc.response is not None else None
145
+ if status == 422:
146
+ logger.warning(
147
+ "Scene media API rejected payload for scene %s (422): %s",
148
+ scene_id,
149
+ exc.response.text if exc.response is not None else "",
150
+ )
151
+ return
152
+ if status == 500:
153
+ logger.warning(
154
+ "Scene media API encountered server error (500) for scene %s; skipping registration",
155
+ scene_id,
156
+ )
157
+ return
158
+ logger.exception("Failed to register scene media entries for scene %s", scene_id)
159
+ except requests.RequestException:
160
+ logger.exception("Failed to register scene media entries for scene %s", scene_id)
161
+
162
+
163
  def _resolve_frame_entry(entry: Any, *, index: int, dest_dir: Path) -> FrameRecord:
164
  metadata: dict[str, Any] = {}
165
  timestamp = None
 
400
  offset += request_limit
401
 
402
  return records
 
 
 
 
 
 
 
 
 
 
403
  def _save_pointmaps(
404
  *,
405
  runtime: WorkerRuntime,
 
415
  raise RuntimeError("Predictions missing world points")
416
 
417
  world_points = np.asarray(world_points)
418
+ confidence = pose_confidence(predictions)
419
  if confidence is None:
420
  confidence = np.ones(world_points.shape[:-1], dtype=np.float32)
421
 
 
695
  max_gap=max(0, settings.keyframe_max_gap_frames),
696
  )
697
  total_frames = len(frame_records)
698
+ confidence = pose_confidence(predictions)
699
  world_points = predictions.get("world_points")
700
  if world_points is None:
701
  world_points = predictions.get("world_points_from_depth")
 
782
  ) -> list[dict[str, Any]]:
783
  if top_k <= 0:
784
  return []
785
+ confidence = pose_confidence(predictions)
786
  if confidence is None:
787
  return []
788
  scores = confidence.reshape(confidence.shape[0], -1).mean(axis=1)
 
857
  ceiling_margin_value = float(ceiling_margin_value) if ceiling_margin_value is not None else 0.05
858
  except (TypeError, ValueError):
859
  ceiling_margin_value = 0.05
860
+ ceiling_z_max = payload.get("ceiling_z_max")
861
+ try:
862
+ ceiling_z_max_value = float(ceiling_z_max) if ceiling_z_max is not None else None
863
+ except (TypeError, ValueError):
864
+ ceiling_z_max_value = None
865
 
866
  scene = predictions_to_glb(
867
  dict(predictions),
 
875
  prediction_mode=payload.get("prediction_mode", "Predicted Pointmap"),
876
  ceiling_percentile=ceiling_percentile_value,
877
  ceiling_margin=ceiling_margin_value,
878
+ ceiling_z_max=ceiling_z_max_value,
879
  )
880
  scene.export(file_obj=str(local_file))
881
  key = runtime.storage.build_key(
 
1334
  return _execute_job("model_build", payload, _handle_model_build)
1335
 
1336
 
1337
+ def _fallback_selection(frame_records: list[FrameRecord], top_k: int) -> KeyframeSelectionResult:
1338
+ indices = linear_sample_indices(len(frame_records), top_k)
1339
+ diagnostics = [
1340
+ {
1341
+ "frame_id": frame_records[idx].frame_id,
1342
+ "frame_index": frame_records[idx].index,
1343
+ "reason": "linear",
1344
+ }
1345
+ for idx in indices
1346
+ ]
1347
+ return KeyframeSelectionResult(indices=indices, diagnostics=diagnostics, top_k=len(indices))
1348
+
1349
+
1350
+ def keyframe_selection_job(payload: Mapping[str, Any]) -> dict[str, Any]:
1351
+ runtime = get_runtime()
1352
+ job = get_current_job()
1353
+ payload = dict(payload)
1354
+
1355
+ job_id = str(payload.get("job_id") or (job.id if job else uuid.uuid4()))
1356
+ scene_id = payload.get("scene_id")
1357
+ if not scene_id:
1358
+ raise ValueError("Keyframe job payload is missing 'scene_id'")
1359
+ video_key = payload.get("video_key")
1360
+ if not video_key:
1361
+ raise ValueError("Keyframe job payload is missing 'video_key'")
1362
+
1363
+ job_type = "keyframe_selection"
1364
+ job_meta = {
1365
+ "job_id": job_id,
1366
+ "job_type": job_type,
1367
+ "scene_id": scene_id,
1368
+ }
1369
+
1370
+ sanitized_payload = {
1371
+ "scene_id": scene_id,
1372
+ "video_key": video_key,
1373
+ "top_k": payload.get("top_k"),
1374
+ "extract_fps": payload.get("extract_fps"),
1375
+ "extract_max_frames": payload.get("extract_max_frames"),
1376
+ }
1377
+
1378
+ runtime.db.upsert_job(
1379
+ job_id=job_id,
1380
+ job_type=job_type,
1381
+ scene_id=scene_id,
1382
+ status="started",
1383
+ payload=sanitized_payload,
1384
+ )
1385
+ runtime_emit(
1386
+ runtime,
1387
+ {
1388
+ **job_meta,
1389
+ "status": "started",
1390
+ "progress": 0,
1391
+ "ts": datetime.now(timezone.utc).timestamp(),
1392
+ },
1393
+ )
1394
+
1395
+ start_time = perf_counter()
1396
+
1397
+ try:
1398
+ with tempfile.TemporaryDirectory(prefix=f"keyframe_{job_id}_") as tmp_dir:
1399
+ temp_path = Path(tmp_dir)
1400
+ video_path = temp_path / "input_video"
1401
+ runtime.storage.download_to_path(video_key, video_path)
1402
+
1403
+ extract_fps = payload.get("extract_fps")
1404
+ try:
1405
+ extract_fps_value = float(extract_fps) if extract_fps is not None else runtime.settings.keyframe_extract_fps
1406
+ except (TypeError, ValueError):
1407
+ extract_fps_value = runtime.settings.keyframe_extract_fps
1408
+
1409
+ max_frames = _as_int(
1410
+ payload.get("extract_max_frames"),
1411
+ runtime.settings.keyframe_extract_max_frames,
1412
+ )
1413
+
1414
+ frame_records, native_fps = extract_video_frames(
1415
+ video_path,
1416
+ temp_path / "frames",
1417
+ target_fps=extract_fps_value,
1418
+ max_frames=max_frames,
1419
+ )
1420
+
1421
+ selection = run_keyframe_prepass(
1422
+ runtime=runtime,
1423
+ payload=payload,
1424
+ frame_records=frame_records,
1425
+ mode="window",
1426
+ streaming=True,
1427
+ window_size=runtime.settings.stream_window_size,
1428
+ )
1429
+ if selection is None or not selection.indices:
1430
+ requested_top_k = _as_int(payload.get("top_k"), runtime.settings.keyframe_default_top_k)
1431
+ selection = _fallback_selection(frame_records, requested_top_k)
1432
+
1433
+ selected_records = [frame_records[i] for i in selection.indices]
1434
+ storage_entries, media_entries = build_keyframe_uploads(
1435
+ runtime,
1436
+ scene_id,
1437
+ selected_records,
1438
+ selection.diagnostics,
1439
+ subdir=runtime.settings.keyframe_upload_dir,
1440
+ )
1441
+
1442
+ _register_scene_media_entries(runtime, scene_id, media_entries)
1443
+
1444
+ result_payload = {
1445
+ "job_id": job_id,
1446
+ "job_type": job_type,
1447
+ "scene_id": scene_id,
1448
+ "video_key": video_key,
1449
+ "native_fps": native_fps,
1450
+ "total_frames": len(frame_records),
1451
+ "selected_frames": storage_entries,
1452
+ "selection": selection.diagnostics,
1453
+ }
1454
+
1455
+ except Exception as exc:
1456
+ error_text = traceback.format_exc()
1457
+ runtime.db.upsert_job(
1458
+ job_id=job_id,
1459
+ job_type=job_type,
1460
+ scene_id=scene_id,
1461
+ status="failed",
1462
+ error=error_text,
1463
+ )
1464
+ runtime_emit(
1465
+ runtime,
1466
+ {
1467
+ **job_meta,
1468
+ "status": "failed",
1469
+ "ts": datetime.now(timezone.utc).timestamp(),
1470
+ "error": str(exc),
1471
+ },
1472
+ )
1473
+ logger.exception("Keyframe selection job %s failed", job_id)
1474
+ raise
1475
+
1476
+ runtime.db.upsert_job(
1477
+ job_id=job_id,
1478
+ job_type=job_type,
1479
+ scene_id=scene_id,
1480
+ status="finished",
1481
+ result=result_payload,
1482
+ )
1483
+
1484
+ runtime_emit(
1485
+ runtime,
1486
+ {
1487
+ **job_meta,
1488
+ "status": "finished",
1489
+ "progress": 100,
1490
+ "ts": datetime.now(timezone.utc).timestamp(),
1491
+ },
1492
+ )
1493
+
1494
+ logger.info(
1495
+ "Keyframe selection job %s finished in %.2fs (selected %d/%d frames)",
1496
+ job_id,
1497
+ perf_counter() - start_time,
1498
+ len(selection.indices),
1499
+ len(frame_records),
1500
+ )
1501
+
1502
+ return result_payload
1503
+
1504
+
1505
  def _handle_model_build(
1506
  *,
1507
  runtime: WorkerRuntime,
tests/test_voxel_reduction.py CHANGED
@@ -222,3 +222,47 @@ def test_predictions_to_glb_ceiling_filter():
222
  point_cloud = next(iter(scene.geometry.values()))
223
  max_z = point_cloud.vertices[:, 2].max()
224
  assert max_z < 1.6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
222
  point_cloud = next(iter(scene.geometry.values()))
223
  max_z = point_cloud.vertices[:, 2].max()
224
  assert max_z < 1.6
225
+
226
+
227
+ def test_predictions_to_glb_ceiling_absolute_cut():
228
+ world_points = np.array(
229
+ [
230
+ [
231
+ [[0.0, 0.0, 0.5], [0.0, 0.0, 1.0]],
232
+ [[0.0, 0.0, 1.2], [0.0, 0.0, 2.0]],
233
+ ]
234
+ ],
235
+ dtype=np.float32,
236
+ )
237
+ predictions = {
238
+ "world_points": world_points,
239
+ "world_points_conf": np.ones((1, 2, 2), dtype=np.float32),
240
+ "world_points_from_depth": world_points,
241
+ "depth_conf": np.ones((1, 2, 2), dtype=np.float32),
242
+ "images": np.ones((1, 2, 2, 3), dtype=np.float32) * 0.5,
243
+ "extrinsic": np.array(
244
+ [
245
+ [
246
+ [1.0, 0.0, 0.0, 0.0],
247
+ [0.0, 1.0, 0.0, 0.0],
248
+ [0.0, 0.0, 1.0, 0.0],
249
+ ]
250
+ ],
251
+ dtype=np.float32,
252
+ ),
253
+ }
254
+
255
+ scene = predictions_to_glb(
256
+ predictions,
257
+ conf_thres=0.0,
258
+ voxel_size=None,
259
+ o3d_denoise=False,
260
+ density_filter=False,
261
+ reinflate_enabled=False,
262
+ ceiling_z_max=1.1,
263
+ )
264
+
265
+ assert isinstance(scene, trimesh.Scene)
266
+ point_cloud = next(iter(scene.geometry.values()))
267
+ max_z = point_cloud.vertices[:, 2].max()
268
+ assert max_z <= 1.1
worker/stream3r/jobs.py CHANGED
@@ -4,12 +4,13 @@ from __future__ import annotations
4
 
5
  from typing import Any, Callable, Mapping
6
 
7
- from stream3r.worker.tasks import model_build_job, pose_pointmap_job
8
 
9
 
10
  _HANDLERS: dict[str, Callable[[Mapping[str, Any]], Any]] = {
11
  "pose_pointmap": pose_pointmap_job,
12
  "model_build": model_build_job,
 
13
  }
14
 
15
 
@@ -42,7 +43,7 @@ def handle_job(*args: Any, **kwargs: Any) -> Any:
42
  if isinstance(candidate, Mapping):
43
  payload = candidate
44
 
45
- if payload is None and isinstance(args[0], Mapping):
46
  payload = args[0]
47
  job_type = str(payload.get("job_type")) if payload else job_type
48
 
@@ -60,4 +61,3 @@ def handle_job(*args: Any, **kwargs: Any) -> Any:
60
  raise ValueError(f"Unsupported job_type '{job_type}'")
61
 
62
  return handler(payload)
63
-
 
4
 
5
  from typing import Any, Callable, Mapping
6
 
7
+ from stream3r.worker.tasks import keyframe_selection_job, model_build_job, pose_pointmap_job
8
 
9
 
10
  _HANDLERS: dict[str, Callable[[Mapping[str, Any]], Any]] = {
11
  "pose_pointmap": pose_pointmap_job,
12
  "model_build": model_build_job,
13
+ "keyframe_selection": keyframe_selection_job,
14
  }
15
 
16
 
 
43
  if isinstance(candidate, Mapping):
44
  payload = candidate
45
 
46
+ if payload is None and args and isinstance(args[0], Mapping):
47
  payload = args[0]
48
  job_type = str(payload.get("job_type")) if payload else job_type
49
 
 
61
  raise ValueError(f"Unsupported job_type '{job_type}'")
62
 
63
  return handler(payload)