Spaces:
Configuration error
Configuration error
Commit
·
01e8928
1
Parent(s):
de1bede
split key framer out
Browse files- design_docs/keyframe_service.md +178 -0
- docs/api_usage.md +598 -0
- notes.md +2 -0
- stream3r/utils/__pycache__/visual_utils.cpython-311.pyc +0 -0
- stream3r/utils/visual_utils.py +16 -0
- stream3r/worker/__init__.py +2 -1
- stream3r/worker/config.py +23 -1
- stream3r/worker/keyframes.py +408 -0
- stream3r/worker/main.py +1 -1
- stream3r/worker/tasks.py +233 -33
- tests/test_voxel_reduction.py +44 -0
- worker/stream3r/jobs.py +3 -3
design_docs/keyframe_service.md
ADDED
|
@@ -0,0 +1,178 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
```markdown
|
| 2 |
+
# Keyframe Selection Service
|
| 3 |
+
|
| 4 |
+
**Author:** Brian Clark
|
| 5 |
+
**Last Updated:** 2025-11-07
|
| 6 |
+
**Audience:** Integrators needing motion-aware keyframes instead of linear sampling
|
| 7 |
+
|
| 8 |
+
---
|
| 9 |
+
|
| 10 |
+
## Overview
|
| 11 |
+
|
| 12 |
+
The keyframe selection worker ingests a raw video (Backblaze/S3 key), extracts frames, runs motion + coverage analysis, and uploads the chosen keyframes back to the media store while recording them in the `scene_media` table. Unlike linear FPS sampling, it keeps only the most informative views (~10–20 frames for a typical 30 s scan).
|
| 13 |
+
|
| 14 |
+
The workflow is exposed as an RQ job (`keyframe_selection`). Another service (e.g., scene-graph-manager) can enqueue this job instead of doing its own downsampling.
|
| 15 |
+
|
| 16 |
+
---
|
| 17 |
+
|
| 18 |
+
## Job Inputs
|
| 19 |
+
|
| 20 |
+
| Field | Required | Description |
|
| 21 |
+
|-------|----------|-------------|
|
| 22 |
+
| `scene_id` | ✅ | Scene identifier used in `scene_media` |
|
| 23 |
+
| `video_key` | ✅ | Storage key (Backblaze/S3) for the source video |
|
| 24 |
+
| `top_k` | optional | Desired maximum keyframes; defaults to `STREAM3R_KEYFRAME_TOP_K` |
|
| 25 |
+
| `extract_fps` | optional | Override extraction FPS (default `STREAM3R_KEYFRAME_EXTRACT_FPS`) |
|
| 26 |
+
| `extract_max_frames` | optional | Cap on total decoded frames (default `STREAM3R_KEYFRAME_EXTRACT_MAX_FRAMES`) |
|
| 27 |
+
| Optional filters | e.g., `ceiling_percentile`, `ceiling_z_max` for downstream GLB |
|
| 28 |
+
|
| 29 |
+
Example payload:
|
| 30 |
+
|
| 31 |
+
```json
|
| 32 |
+
{
|
| 33 |
+
"job_type": "keyframe_selection",
|
| 34 |
+
"scene_id": "scene-123",
|
| 35 |
+
"video_key": "scene-data/videos/scene-123.mp4",
|
| 36 |
+
"top_k": 16,
|
| 37 |
+
"extract_fps": 6.0
|
| 38 |
+
}
|
| 39 |
+
```
|
| 40 |
+
|
| 41 |
+
Enqueue via RQ:
|
| 42 |
+
|
| 43 |
+
```python
|
| 44 |
+
from rq import Queue
|
| 45 |
+
from redis import Redis
|
| 46 |
+
|
| 47 |
+
queue = Queue("keyframe_selection", connection=Redis.from_url("redis://"))
|
| 48 |
+
queue.enqueue("worker.stream3r.jobs.handle_job", {
|
| 49 |
+
"job_type": "keyframe_selection",
|
| 50 |
+
"scene_id": "scene-123",
|
| 51 |
+
"video_key": "scene-data/videos/scene-123.mp4"
|
| 52 |
+
})
|
| 53 |
+
```
|
| 54 |
+
|
| 55 |
+
---
|
| 56 |
+
|
| 57 |
+
## Processing Pipeline
|
| 58 |
+
|
| 59 |
+
1. **Download & Extract Frames**
|
| 60 |
+
- Video pulled via `runtime.storage.download_to_path`.
|
| 61 |
+
- Frames decoded with OpenCV at `extract_fps` (default 6 fps) up to `extract_max_frames`.
|
| 62 |
+
|
| 63 |
+
2. **Motion + Coverage Pre-pass**
|
| 64 |
+
- Lightweight Stream3R windowed inference run to collect poses (`extrinsic`) and confidences.
|
| 65 |
+
- Motion scoring: translation + weighted rotation deltas, greedily ensures pose diversity.
|
| 66 |
+
- Coverage scoring: counts high-confidence voxel IDs contributed per frame, greedily maximizes new coverage.
|
| 67 |
+
- Diagnostics stored per frame (reason, motion delta, coverage gain, confidence).
|
| 68 |
+
- If inference fails, fallback to linear sampling.
|
| 69 |
+
|
| 70 |
+
3. **Selection & Upload**
|
| 71 |
+
- Selected frames copied; images uploaded via `runtime.storage.upload_file` under `keyframe_upload_dir` (default `keyframes`).
|
| 72 |
+
- Scene media rows inserted through `record_scene_media_entries` API.
|
| 73 |
+
- Manifest (`selected_frames`) returned with storage keys and diagnostics.
|
| 74 |
+
|
| 75 |
+
4. **Result**
|
| 76 |
+
- Job metadata includes: native video FPS, total extracted frames, selected frame details, and diagnostics.
|
| 77 |
+
- `selected_frames.json` (optional) can be stored by downstream jobs for auditing.
|
| 78 |
+
- Scene-media registration is attempted via the configured API; if the endpoint does not accept POST (e.g., legacy deployments), the worker logs and skips registration without failing the job.
|
| 79 |
+
|
| 80 |
+
---
|
| 81 |
+
|
| 82 |
+
## Outputs & Diagnostics
|
| 83 |
+
|
| 84 |
+
Result payload (simplified):
|
| 85 |
+
|
| 86 |
+
```json
|
| 87 |
+
{
|
| 88 |
+
"job_id": "...",
|
| 89 |
+
"scene_id": "scene-123",
|
| 90 |
+
"video_key": "scene-data/videos/scene-123.mp4",
|
| 91 |
+
"native_fps": 29.97,
|
| 92 |
+
"total_frames": 420,
|
| 93 |
+
"selected_frames": [
|
| 94 |
+
{
|
| 95 |
+
"frame_id": "frame_000012",
|
| 96 |
+
"frame_index": 12,
|
| 97 |
+
"url": "s3://bucket/scenes/scene-123/keyframes/frame_000012.jpg",
|
| 98 |
+
"storage_key": "scenes/scene-123/keyframes/frame_000012.jpg",
|
| 99 |
+
"diagnostics": {
|
| 100 |
+
"reason": "motion",
|
| 101 |
+
"motion_delta": 0.42,
|
| 102 |
+
"coverage_gain_ratio": 0.08,
|
| 103 |
+
"mean_confidence": 0.67
|
| 104 |
+
}
|
| 105 |
+
}
|
| 106 |
+
]
|
| 107 |
+
}
|
| 108 |
+
```
|
| 109 |
+
|
| 110 |
+
Each uploaded frame is also inserted/updated in the `scene_media` table via the Scene Graph API (`/scenes/{scene_id}/media`).
|
| 111 |
+
|
| 112 |
+
---
|
| 113 |
+
|
| 114 |
+
## Configuration
|
| 115 |
+
|
| 116 |
+
Environment variables for fine-tuning:
|
| 117 |
+
|
| 118 |
+
| Env Var | Default | Notes |
|
| 119 |
+
|---------|---------|-------|
|
| 120 |
+
| `STREAM3R_QUEUE_KEYFRAME` | `keyframe_selection` | RQ queue name |
|
| 121 |
+
| `STREAM3R_KEYFRAME_EXTRACT_FPS` | `6.0` | Extraction FPS |
|
| 122 |
+
| `STREAM3R_KEYFRAME_EXTRACT_MAX_FRAMES` | `1200` | Extraction cap |
|
| 123 |
+
| `STREAM3R_KEYFRAME_UPLOAD_DIR` | `keyframes` | Storage subdirectory |
|
| 124 |
+
| `STREAM3R_KEYFRAME_TOP_K` | `16` | Default selection budget |
|
| 125 |
+
| `STREAM3R_KEYFRAME_PREPASS` | `1` | Enable motion/coverage inference |
|
| 126 |
+
| `STREAM3R_KEYFRAME_MOTION_THRESH` | `0.4` | Motion threshold |
|
| 127 |
+
| `STREAM3R_KEYFRAME_ROT_WEIGHT` | `0.5` | Rotation weight |
|
| 128 |
+
| `STREAM3R_KEYFRAME_MIN_GAIN` | `0.01` | Min coverage gain |
|
| 129 |
+
| `STREAM3R_KEYFRAME_FULL_MAX_FRAMES` | `24` | Switch to full attention when below |
|
| 130 |
+
|
| 131 |
+
Scene media API requirements:
|
| 132 |
+
- `STREAM3R_MEDIA_API_BASE_URL`
|
| 133 |
+
- `STREAM3R_MEDIA_API_TOKEN` for authenticated inserts
|
| 134 |
+
|
| 135 |
+
Ceiling trimming (optional) can be set per job via `ceiling_percentile`, `ceiling_margin`, or `ceiling_z_max` so downstream GLBs remain clean.
|
| 136 |
+
|
| 137 |
+
---
|
| 138 |
+
|
| 139 |
+
## Integration Steps for External Services
|
| 140 |
+
|
| 141 |
+
1. **Deploy Worker Queue**
|
| 142 |
+
- Run the Stream3R worker with `--queue keyframe_selection` (already default when env var set).
|
| 143 |
+
- Ensure GPU not required: pre-pass uses Stream3R; CPU-only environments should set `STREAM3R_MODEL_DEVICE=cpu` or schedule on GPU hosts.
|
| 144 |
+
|
| 145 |
+
2. **Enqueue Jobs**
|
| 146 |
+
- Replace existing linear sampling code with an RQ enqueue call.
|
| 147 |
+
- Store job IDs if you need to poll job status or consume events.
|
| 148 |
+
|
| 149 |
+
3. **Consume Results**
|
| 150 |
+
- After completion, list `scene_media` for `media_type=image` to retrieve new keyframe entries.
|
| 151 |
+
- Inspect returned diagnostics for debugging or to render navigation overlays.
|
| 152 |
+
|
| 153 |
+
4. **Fallback Handling**
|
| 154 |
+
- If the job fails, the queue returns error details; you can revert to your existing sampler.
|
| 155 |
+
- Consider scheduling a retry with adjusted parameters (e.g., lower `top_k`).
|
| 156 |
+
|
| 157 |
+
---
|
| 158 |
+
|
| 159 |
+
## Benefits vs. Linear Sampling
|
| 160 |
+
|
| 161 |
+
- **Fewer redundant frames**: motion-aware spacing ensures pose diversity.
|
| 162 |
+
- **Better geometry coverage**: only keeps frames that add new high-confidence voxels.
|
| 163 |
+
- **Consistent diagnostics**: each selected frame includes reasons and confidence, aiding QA.
|
| 164 |
+
- **Automatic uploads**: frames stored in Backblaze/local storage with `scene_media` entries ready for viewers.
|
| 165 |
+
|
| 166 |
+
---
|
| 167 |
+
|
| 168 |
+
## Future Enhancements
|
| 169 |
+
|
| 170 |
+
- Optional semantic filtering to avoid ceilings/walls.
|
| 171 |
+
- Exposure of thumbnails or depth maps alongside keyframes.
|
| 172 |
+
- Batch selection across multiple videos.
|
| 173 |
+
|
| 174 |
+
---
|
| 175 |
+
|
| 176 |
+
For questions or integration support, contact the Stream3R team or refer to `stream3r/worker/keyframes.py` for implementation details.
|
| 177 |
+
|
| 178 |
+
```
|
docs/api_usage.md
ADDED
|
@@ -0,0 +1,598 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Scene Graph Manager API Usage
|
| 2 |
+
|
| 3 |
+
This document gives a concise, high-signal overview of the HTTP API so other services can integrate without reading the whole codebase.
|
| 4 |
+
|
| 5 |
+
## Refresh Checklist (for future updates)
|
| 6 |
+
|
| 7 |
+
Use this prompt when endpoints change:
|
| 8 |
+
|
| 9 |
+
> Run `rg "@app" -n api/app.py` and list new/modified routes. Update `docs/api_usage.md` with:
|
| 10 |
+
> - Endpoint table (method, path, summary)
|
| 11 |
+
> - Auth / header requirements
|
| 12 |
+
> - Request & response JSON samples (curl when useful)
|
| 13 |
+
> - Notes on query params, error behaviors, and streaming endpoints.
|
| 14 |
+
> Re-run `python -m compileall api/app.py` and ensure the doc still reflects reality.
|
| 15 |
+
|
| 16 |
+
## Base URL
|
| 17 |
+
|
| 18 |
+
All paths below are relative to:
|
| 19 |
+
|
| 20 |
+
```
|
| 21 |
+
https://scene-graph-mgr-api.fly.dev
|
| 22 |
+
```
|
| 23 |
+
|
| 24 |
+
## Authentication
|
| 25 |
+
|
| 26 |
+
- Public endpoints are unauthenticated but subject to Fly.io rate limits.
|
| 27 |
+
- Internal write endpoints require a shared secret header: `x-internal-secret: <secret>`.
|
| 28 |
+
- Standard error payload:
|
| 29 |
+
|
| 30 |
+
```json
|
| 31 |
+
{
|
| 32 |
+
"detail": "Human readable message",
|
| 33 |
+
"error_code": "optional-machine-code"
|
| 34 |
+
}
|
| 35 |
+
```
|
| 36 |
+
|
| 37 |
+
4xx indicates caller issues (validation, missing scene). 5xx means server-side failure.
|
| 38 |
+
|
| 39 |
+
## Endpoint Map
|
| 40 |
+
|
| 41 |
+
| Method | Path | Summary |
|
| 42 |
+
| --- | --- | --- |
|
| 43 |
+
| GET | `/healthz` | Liveness check. |
|
| 44 |
+
| GET | `/scenes` | List scenes with summary metadata (optional `include_empty`). |
|
| 45 |
+
| PUT | `/scenes/{scene_id}` | Overwrite a scene with a full graph payload. |
|
| 46 |
+
| PATCH | `/scenes/{scene_id}` | Apply an RFC 6902 patch to the latest graph. |
|
| 47 |
+
| POST | `/scenes/{scene_id}/create` | Seed an empty scene (optionally overwrite). |
|
| 48 |
+
| GET | `/scenes/{scene_id}/versions/latest` | Latest graph version. |
|
| 49 |
+
| GET | `/scenes/{scene_id}/versions` | Metadata for all versions. |
|
| 50 |
+
| GET | `/scenes/{scene_id}/versions/{version_id}` | Raw version record (JSON). |
|
| 51 |
+
| POST | `/scenes/{scene_id}/add_image` | Upload image bytes and enqueue processing. |
|
| 52 |
+
| POST | `/scenes/{scene_id}/add_images_from_keys` | Enqueue processing for existing S3 keys. |
|
| 53 |
+
| POST | `/scenes/{scene_id}/upload_video` | Upload video and queue frame extraction batch. |
|
| 54 |
+
| POST | `/scenes/{scene_id}/upload_image` | Browser upload → WebP resize → presigned URL. |
|
| 55 |
+
| GET | `/scenes/{scene_id}/instances` | List stored instances for the scene (status filters optional). |
|
| 56 |
+
| GET | `/scenes/{scene_id}/objects/{obj_id}/beliefs` | Latest improver belief for an object. |
|
| 57 |
+
| GET | `/scenes/{scene_id}/objects/{obj_id}/instances` | List stored instances for an object (status filters optional). |
|
| 58 |
+
| GET | `/scenes/{scene_id}/improver/backlog` | Inspect improver queue/backlog for a scene. |
|
| 59 |
+
| POST | `/scenes/{scene_id}/objects/{obj_id}/instances/{instance_id}/improve_instance` | Queue improver workflow for an instance. |
|
| 60 |
+
| GET | `/scenes/{scene_id}/runlogs` | Recent improver run logs (filterable). |
|
| 61 |
+
| GET | `/scenes/{scene_id}/change-requests` | Inspect change request queue (pending/applied). |
|
| 62 |
+
| GET | `/scenes/{scene_id}/change-summaries` | Stream applied change summaries (newest first). |
|
| 63 |
+
| GET | `/scenes/{scene_id}/diff` | Structural diff between two versions (machine patch & stats). |
|
| 64 |
+
| GET | `/scenes/{scene_id}/diff-semantic` | Semantic diff (ID-aware ops). |
|
| 65 |
+
| GET | `/scenes/{scene_id}/images/presign` | Generate presigned GET URL for an image key. |
|
| 66 |
+
| GET | `/scenes/{scene_id}/media` | List recorded media assets (images/videos). |
|
| 67 |
+
| POST | `/scenes/{scene_id}/media` | Upsert scene media entries supplied by workers or services. |
|
| 68 |
+
| GET | `/jobs/{job_id}` | Inspect queued/finished RQ jobs. |
|
| 69 |
+
| POST | `/stream3r/jobs` | Submit a reconstruction job (pose pointmap or model build). |
|
| 70 |
+
| GET | `/stream3r/jobs/{job_id}` | Inspect a Stream3R job record. |
|
| 71 |
+
| GET | `/stream3r/jobs` | List Stream3R jobs (filter by scene, type, status). |
|
| 72 |
+
| GET | `/stream3r/jobs/{job_id}/events` | Server-Sent Events feed for job lifecycle updates. |
|
| 73 |
+
| GET | `/stream3r/models/{scene_id}/presign` | Presign the latest model_build scene.glb for download. |
|
| 74 |
+
| POST | `/internal/commit-version` | Worker/internal scene commit (requires secret). |
|
| 75 |
+
| GET | `/debug/s3-ping` | Smoke-test S3/B2 credentials. |
|
| 76 |
+
| WS | `/ws?channel=all|{scene_id}` | Broadcasts scene events over WebSocket. |
|
| 77 |
+
|
| 78 |
+
---
|
| 79 |
+
|
| 80 |
+
## Endpoint Details & Examples
|
| 81 |
+
|
| 82 |
+
### Health & Diagnostics
|
| 83 |
+
|
| 84 |
+
#### `GET /healthz`
|
| 85 |
+
Returns `{ "ok": true, "ts": "ISO timestamp" }` for uptime monitoring.
|
| 86 |
+
|
| 87 |
+
#### `GET /debug/s3-ping`
|
| 88 |
+
Verifies object storage connectivity using a put/delete round-trip. Good for smoke tests.
|
| 89 |
+
|
| 90 |
+
### Scene Lifecycle
|
| 91 |
+
|
| 92 |
+
#### `GET /scenes`
|
| 93 |
+
List scenes. Optional `include_empty=true` includes scenes with no versions yet.
|
| 94 |
+
|
| 95 |
+
```bash
|
| 96 |
+
curl "https://scene-graph-mgr-api.fly.dev/scenes?include_empty=true"
|
| 97 |
+
```
|
| 98 |
+
|
| 99 |
+
Response contains summaries with counts and most recent version metadata.
|
| 100 |
+
|
| 101 |
+
#### `PUT /scenes/{scene_id}`
|
| 102 |
+
Replace entire scene graph.
|
| 103 |
+
|
| 104 |
+
Request body (`SceneGraphEnvelope`):
|
| 105 |
+
```json
|
| 106 |
+
{
|
| 107 |
+
"scene_location_id": "scene-123",
|
| 108 |
+
"scene_graph": {"objects": [], "relations": []},
|
| 109 |
+
"meta": {"source": "manual"}
|
| 110 |
+
}
|
| 111 |
+
```
|
| 112 |
+
Creates a new version record and broadcasts `scene.put` on WebSocket.
|
| 113 |
+
|
| 114 |
+
#### `PATCH /scenes/{scene_id}`
|
| 115 |
+
Apply RFC 6902 patch to latest graph. Optional `base_version` guard prevents lost updates.
|
| 116 |
+
|
| 117 |
+
```json
|
| 118 |
+
{
|
| 119 |
+
"scene_location_id": "scene-123",
|
| 120 |
+
"json_patch": [
|
| 121 |
+
{"op": "add", "path": "/objects/-", "value": {"id": "chair-1", "attributes": {}}}
|
| 122 |
+
],
|
| 123 |
+
"base_version": 1728400000000
|
| 124 |
+
}
|
| 125 |
+
```
|
| 126 |
+
|
| 127 |
+
#### `POST /scenes/{scene_id}/create`
|
| 128 |
+
Seeds an empty graph. `overwrite=true` allows reseeding existing scenes.
|
| 129 |
+
|
| 130 |
+
#### Version Reads
|
| 131 |
+
- `GET /scenes/{scene_id}/versions/latest` → latest full graph.
|
| 132 |
+
- `GET /scenes/{scene_id}/versions` → list of `{version_id, created_at, bytes}`.
|
| 133 |
+
- `GET /scenes/{scene_id}/versions/{version_id}` → raw record (graph + metadata).
|
| 134 |
+
|
| 135 |
+
### Image Upload & Processing
|
| 136 |
+
|
| 137 |
+
#### `POST /scenes/{scene_id}/add_image`
|
| 138 |
+
Multipart upload (`file=@image.jpg`). Stores bytes via S3-compatible API, seeds scene if empty, enqueues `worker.tasks.process_image_for_scene`. Response includes RQ `job_id` and filename.
|
| 139 |
+
|
| 140 |
+
#### `POST /scenes/{scene_id}/add_images_from_keys`
|
| 141 |
+
JSON body:
|
| 142 |
+
```json
|
| 143 |
+
{
|
| 144 |
+
"keys": ["scenes/scene-123/images/20241008/a.png"],
|
| 145 |
+
"room_hint": "living_room",
|
| 146 |
+
"prompt": "describe objects",
|
| 147 |
+
"bounding_boxes": {"a.png": [[0.1,0.2,0.5,0.8]]}
|
| 148 |
+
}
|
| 149 |
+
```
|
| 150 |
+
Seeds scene if needed and enqueues batch worker job.
|
| 151 |
+
|
| 152 |
+
Response (`AddImagesFromKeysResponse`):
|
| 153 |
+
|
| 154 |
+
```json
|
| 155 |
+
{
|
| 156 |
+
"scene_location_id": "scene-123",
|
| 157 |
+
"queued_at": "2024-10-08T14:12:05Z",
|
| 158 |
+
"job_id": "rq-job-id",
|
| 159 |
+
"keys": [
|
| 160 |
+
"scenes/scene-123/images/20241008/a.png",
|
| 161 |
+
"scenes/scene-123/images/20241008/b.png"
|
| 162 |
+
]
|
| 163 |
+
}
|
| 164 |
+
```
|
| 165 |
+
|
| 166 |
+
#### `POST /scenes/{scene_id}/upload_image`
|
| 167 |
+
For browser uploads. Accepts multipart file, resizes to WebP (max width 1024), uploads, and returns:
|
| 168 |
+
```json
|
| 169 |
+
{
|
| 170 |
+
"scene_location_id": "scene-123",
|
| 171 |
+
"key": "scenes/scene-123/images/20241008/abc.webp",
|
| 172 |
+
"url": "https://...presigned...",
|
| 173 |
+
"width": 768,
|
| 174 |
+
"height": 512,
|
| 175 |
+
"bytes": 123456
|
| 176 |
+
}
|
| 177 |
+
```
|
| 178 |
+
|
| 179 |
+
#### `GET /scenes/{scene_id}/images/presign`
|
| 180 |
+
Query parameters: `key` (must reside under `scenes/{scene_id}/images/…` **or** `scenes/{scene_id}/videos/…`) and optional `expires`. Returns `{ "url": "...", "expires_in": 900 }`.
|
| 181 |
+
|
| 182 |
+
### Video Upload & Frame Extraction
|
| 183 |
+
|
| 184 |
+
#### `POST /scenes/{scene_id}/upload_video`
|
| 185 |
+
Accepts a multipart video upload (e.g., `file=@walkthrough.mp4`). Stores the binary in object storage and seeds an empty scene when needed. The worker enqueues `process_video_for_scene`, which:
|
| 186 |
+
|
| 187 |
+
- extracts WebP frames at 2fps by default (`frame_interval=0.5` seconds) and stores them under `scenes/{scene_id}/images/video_frames/...` (or keeps the source under `scenes/{scene_id}/videos/...`) so the `/images/presign` endpoint can serve them;
|
| 188 |
+
- retries with a software H.264 transcode if AV1 or other codecs fail to decode on the host;
|
| 189 |
+
- publishes the same `scene.update` event stream as still-image uploads.
|
| 190 |
+
|
| 191 |
+
Response payload (`UploadVideoResponse`):
|
| 192 |
+
|
| 193 |
+
```json
|
| 194 |
+
{
|
| 195 |
+
"scene_location_id": "scene-123",
|
| 196 |
+
"key": "scenes/scene-123/videos/20241008/abcd1234.mp4",
|
| 197 |
+
"filename": "walkthrough.mp4",
|
| 198 |
+
"size_bytes": 456789012,
|
| 199 |
+
"content_type": "video/mp4",
|
| 200 |
+
"queued_at": "2024-10-08T14:12:05Z",
|
| 201 |
+
"job_id": "rq-job-id"
|
| 202 |
+
}
|
| 203 |
+
```
|
| 204 |
+
Clients should poll `GET /jobs/{job_id}` to track progress. When `job.result.frame_keys` is present the frame extraction succeeded.
|
| 205 |
+
|
| 206 |
+
### Scene Media
|
| 207 |
+
|
| 208 |
+
#### `GET /scenes/{scene_id}/media`
|
| 209 |
+
Lists stored media records for a scene. Use `media_type=image` to filter to keyframes vs. `media_type=video` for source clips.
|
| 210 |
+
|
| 211 |
+
```bash
|
| 212 |
+
curl "https://scene-graph-mgr-api.fly.dev/scenes/scene-123/media?media_type=image&limit=50"
|
| 213 |
+
```
|
| 214 |
+
|
| 215 |
+
Response (`SceneMediaListResponse`) includes the total slice returned plus ISO timestamps normalized to UTC.
|
| 216 |
+
|
| 217 |
+
#### `POST /scenes/{scene_id}/media`
|
| 218 |
+
Upserts media entries (usually called by workers after uploading files). Payload:
|
| 219 |
+
|
| 220 |
+
```json
|
| 221 |
+
{
|
| 222 |
+
"entries": [
|
| 223 |
+
{
|
| 224 |
+
"file": "scenes/scene-123/keyframes/frame_000012.jpg",
|
| 225 |
+
"media_type": "image",
|
| 226 |
+
"captured_at": "2024-10-08T14:12:05Z"
|
| 227 |
+
},
|
| 228 |
+
{
|
| 229 |
+
"file": "scenes/scene-123/videos/20241008/abcd1234.mp4",
|
| 230 |
+
"media_type": "video"
|
| 231 |
+
}
|
| 232 |
+
]
|
| 233 |
+
}
|
| 234 |
+
```
|
| 235 |
+
|
| 236 |
+
- `media_type` is optional—when omitted the server infers it from the filename (`image` vs `video`).
|
| 237 |
+
- `captured_at` accepts ISO-8601 strings (UTC preferred). If omitted, the server stores `now()`.
|
| 238 |
+
- Existing rows are updated in-place thanks to an upsert on the `file` column.
|
| 239 |
+
|
| 240 |
+
Response (`SceneMediaBatchResponse`) summarizes how many entries were accepted:
|
| 241 |
+
|
| 242 |
+
```json
|
| 243 |
+
{
|
| 244 |
+
"scene_id": "scene-123",
|
| 245 |
+
"accepted": 2,
|
| 246 |
+
"skipped": 0,
|
| 247 |
+
"files": [
|
| 248 |
+
"scenes/scene-123/keyframes/frame_000012.jpg",
|
| 249 |
+
"scenes/scene-123/videos/20241008/abcd1234.mp4"
|
| 250 |
+
]
|
| 251 |
+
}
|
| 252 |
+
```
|
| 253 |
+
|
| 254 |
+
### Improver Monitoring
|
| 255 |
+
|
| 256 |
+
#### `GET /scenes/{scene_id}/improver/backlog`
|
| 257 |
+
Summarizes outstanding improver work for a scene. Default statuses are `pending`, `queued`, and `processing`, plus records that do not yet have a status. Use this to detect when the improver has drained the backlog.
|
| 258 |
+
|
| 259 |
+
Query parameters:
|
| 260 |
+
- `limit` (default `200`, max `2000`) — cap the number of instances returned.
|
| 261 |
+
- `status` — optional list to override which statuses count as “not yet processed.” Omit to use the defaults.
|
| 262 |
+
- `include_missing_status` (default `true`) — include rows with `status IS NULL`.
|
| 263 |
+
|
| 264 |
+
Response:
|
| 265 |
+
|
| 266 |
+
```json
|
| 267 |
+
{
|
| 268 |
+
"scene_id": "scene-123",
|
| 269 |
+
"count": 2,
|
| 270 |
+
"instances": [
|
| 271 |
+
{
|
| 272 |
+
"scene_id": "scene-123",
|
| 273 |
+
"obj_id": "obj-1",
|
| 274 |
+
"instance_id": "inst-42",
|
| 275 |
+
"status": "pending",
|
| 276 |
+
"status_reason": "ambient",
|
| 277 |
+
"status_changed_at": "2024-10-08T14:15:06Z",
|
| 278 |
+
"last_event_at": "2024-10-08T14:15:06Z",
|
| 279 |
+
"created_at": "2024-10-08T13:59:40Z",
|
| 280 |
+
"data": {
|
| 281 |
+
"image_id": "img-998",
|
| 282 |
+
"bbox_xyxy": [0.1, 0.2, 0.5, 0.8]
|
| 283 |
+
}
|
| 284 |
+
}
|
| 285 |
+
]
|
| 286 |
+
}
|
| 287 |
+
```
|
| 288 |
+
When `count` is zero the improver has no pending work for that scene.
|
| 289 |
+
|
| 290 |
+
### Improver & Beliefs
|
| 291 |
+
|
| 292 |
+
#### `POST /scenes/{scene_id}/objects/{obj_id}/instances/{instance_id}/improve_instance`
|
| 293 |
+
Queues improver workflow. Body adheres to `InstanceEvent` schema. Example:
|
| 294 |
+
|
| 295 |
+
```bash
|
| 296 |
+
curl -X POST \
|
| 297 |
+
-H "Content-Type: application/json" \
|
| 298 |
+
-d '{
|
| 299 |
+
"scene_id": "scene-123",
|
| 300 |
+
"obj_id": "sofa-1",
|
| 301 |
+
"instance_id": "inst-456",
|
| 302 |
+
"image_id": "scenes/scene-123/images/20241008/sofa.png",
|
| 303 |
+
"bbox_xyxy": [0.1, 0.3, 0.6, 0.9]
|
| 304 |
+
}' \
|
| 305 |
+
https://scene-graph-mgr-api.fly.dev/scenes/scene-123/objects/sofa-1/instances/inst-456/improve_instance
|
| 306 |
+
```
|
| 307 |
+
|
| 308 |
+
Response: `{ "enqueued": true, "job_id": "rq-job-id" }`.
|
| 309 |
+
|
| 310 |
+
The worker embeds, seeds Qdrant, and calls `scene_improver.tasks.improve_scene`.
|
| 311 |
+
|
| 312 |
+
#### `GET /scenes/{scene_id}/objects/{obj_id}/beliefs`
|
| 313 |
+
Returns latest belief payload (name Dirichlet, attribute betas, relations) or 404 if none stored.
|
| 314 |
+
|
| 315 |
+
#### `GET /scenes/{scene_id}/instances`
|
| 316 |
+
Returns persisted instances across every object in the scene. Query params:
|
| 317 |
+
- `limit` (1–5000, default 1000)
|
| 318 |
+
- `status` (optional, repeatable) — only include instances whose `status` matches one of the provided values.
|
| 319 |
+
- `exclude_status` (optional, repeatable) — omit instances whose `status` matches any of the provided values (e.g., `exclude_status=superseded` to skip reassigned instances).
|
| 320 |
+
|
| 321 |
+
Response mirrors the storage map:
|
| 322 |
+
|
| 323 |
+
```json
|
| 324 |
+
{
|
| 325 |
+
"scene_id": "scene-123",
|
| 326 |
+
"count": 5,
|
| 327 |
+
"objects": {
|
| 328 |
+
"sofa-1": [{...}, {...}],
|
| 329 |
+
"lamp-4": [{...}]
|
| 330 |
+
}
|
| 331 |
+
}
|
| 332 |
+
```
|
| 333 |
+
|
| 334 |
+
#### `GET /scenes/{scene_id}/objects/{obj_id}/instances`
|
| 335 |
+
Returns the persisted instance rows for the object. Query params:
|
| 336 |
+
- `limit` (1–1000, default 100)
|
| 337 |
+
- `status` (optional, repeatable) — only include instances whose `status` matches one of the provided values.
|
| 338 |
+
- `exclude_status` (optional, repeatable) — omit instances whose `status` matches any of the provided values.
|
| 339 |
+
|
| 340 |
+
Example request skipping superseded items:
|
| 341 |
+
|
| 342 |
+
```bash
|
| 343 |
+
curl "https://scene-graph-mgr-api.fly.dev/scenes/scene-123/objects/sofa-1/instances?limit=20&exclude_status=superseded"
|
| 344 |
+
```
|
| 345 |
+
|
| 346 |
+
Response:
|
| 347 |
+
|
| 348 |
+
```json
|
| 349 |
+
{
|
| 350 |
+
"scene_id": "scene-123",
|
| 351 |
+
"obj_id": "sofa-1",
|
| 352 |
+
"count": 2,
|
| 353 |
+
"instances": [
|
| 354 |
+
{
|
| 355 |
+
"id": "inst-456",
|
| 356 |
+
"image_id": "scenes/...",
|
| 357 |
+
"bbox_xyxy": [0.1,0.3,0.6,0.9],
|
| 358 |
+
"captured_at": 1728401000,
|
| 359 |
+
"status": "processed"
|
| 360 |
+
},
|
| 361 |
+
{
|
| 362 |
+
"id": "inst-123",
|
| 363 |
+
"image_id": "scenes/...",
|
| 364 |
+
"bbox_xyxy": [0.05,0.2,0.4,0.7],
|
| 365 |
+
"status": "pending"
|
| 366 |
+
}
|
| 367 |
+
]
|
| 368 |
+
}
|
| 369 |
+
```
|
| 370 |
+
|
| 371 |
+
#### `GET /scenes/{scene_id}/runlogs`
|
| 372 |
+
Query params:
|
| 373 |
+
- `limit` (1–1000, default 100)
|
| 374 |
+
- `obj_id` (optional filter)
|
| 375 |
+
- `instance_id` (optional filter)
|
| 376 |
+
|
| 377 |
+
Response contains `records` newest-first with `step`, `message`, `data`, timestamps, and run IDs.
|
| 378 |
+
|
| 379 |
+
#### `GET /scenes/{scene_id}/change-requests`
|
| 380 |
+
Inspect the change request queue. Query params:
|
| 381 |
+
- `state` (optional: `pending`, `applied`, `stale`)
|
| 382 |
+
- `limit` (1–200, default 50)
|
| 383 |
+
|
| 384 |
+
Returns an array mirroring the DB record with preconditions, payload, result, and the new `applied_summary` field:
|
| 385 |
+
|
| 386 |
+
```json
|
| 387 |
+
[
|
| 388 |
+
{
|
| 389 |
+
"request_id": "95e8...",
|
| 390 |
+
"scene_id": "scene-123",
|
| 391 |
+
"obj_id": "table-1",
|
| 392 |
+
"requested_by": "belief-agent",
|
| 393 |
+
"state": "applied",
|
| 394 |
+
"confidence": 0.92,
|
| 395 |
+
"payload": {"operations": [...]},
|
| 396 |
+
"result": {"summary": "Renamed object \"wooden table\" from \"table\" to \"dining table\""},
|
| 397 |
+
"applied_at": "2024-11-19T20:14:32.123Z",
|
| 398 |
+
"applied_summary": "Renamed object \"wooden table\" from \"table\" to \"dining table\""
|
| 399 |
+
}
|
| 400 |
+
]
|
| 401 |
+
```
|
| 402 |
+
|
| 403 |
+
#### `GET /scenes/{scene_id}/change-summaries`
|
| 404 |
+
Lightweight feed of applied change blurbs, ordered by `applied_at` descending. Supports `limit` (1–200, default 50).
|
| 405 |
+
|
| 406 |
+
```bash
|
| 407 |
+
curl "https://scene-graph-mgr-api.fly.dev/scenes/scene-123/change-summaries?limit=20"
|
| 408 |
+
```
|
| 409 |
+
|
| 410 |
+
```json
|
| 411 |
+
[
|
| 412 |
+
{
|
| 413 |
+
"request_id": "95e8...",
|
| 414 |
+
"scene_id": "scene-123",
|
| 415 |
+
"obj_id": "table-1",
|
| 416 |
+
"applied_version": 1729300042000,
|
| 417 |
+
"applied_at": "2024-11-19T20:14:32.123Z",
|
| 418 |
+
"summary": "Updated object \"wooden table\" attribute \"size\" from \"medium\" to \"large\""
|
| 419 |
+
},
|
| 420 |
+
{
|
| 421 |
+
"request_id": "73f1...",
|
| 422 |
+
"scene_id": "scene-123",
|
| 423 |
+
"summary": "Renamed object \"wooden table\" from \"table\" to \"dining table\""
|
| 424 |
+
}
|
| 425 |
+
]
|
| 426 |
+
```
|
| 427 |
+
|
| 428 |
+
Use this endpoint for ambient “stream of consciousness” UIs showing how the improver evolves the scene.
|
| 429 |
+
|
| 430 |
+
### Diffs
|
| 431 |
+
|
| 432 |
+
#### `GET /scenes/{scene_id}/diff`
|
| 433 |
+
Compare two versions (`from_version`, `to_version`). Optional `mode` query (`patch`, `summary`, `both`). Returns machine patch plus stats when requested.
|
| 434 |
+
|
| 435 |
+
#### `GET /scenes/{scene_id}/diff-semantic`
|
| 436 |
+
ID-aware diff returning ordered operations (append/remove/replace) with summary stats.
|
| 437 |
+
|
| 438 |
+
### Stream3R Reconstruction Jobs
|
| 439 |
+
|
| 440 |
+
The Stream3R API wraps the reconstruction workers (pose pointmaps and full models) and provides idempotent enqueue, polling, and event streaming.
|
| 441 |
+
|
| 442 |
+
#### `POST /stream3r/jobs`
|
| 443 |
+
Submit a reconstruction job. Supported `job_type` values are `pose_pointmap` and `model_build`. Provide at least one frame (`url` or `path`) and optional `client_request_id` for idempotency (subsequent calls reuse the existing job and return `200 OK`).
|
| 444 |
+
|
| 445 |
+
```bash
|
| 446 |
+
curl -X POST https://scene-graph-mgr-api.fly.dev/stream3r/jobs \
|
| 447 |
+
-H "Content-Type: application/json" \
|
| 448 |
+
-d '{
|
| 449 |
+
"job_type": "pose_pointmap",
|
| 450 |
+
"scene_id": "scene-123",
|
| 451 |
+
"frames": [
|
| 452 |
+
{"url": "https://cdn.example/scene-123/frame_0000.webp"},
|
| 453 |
+
{"path": "/mnt/captures/scene-123/frame_0001.png"}
|
| 454 |
+
],
|
| 455 |
+
"session_settings": {"prediction_mode": "pointmap"},
|
| 456 |
+
"client_request_id": "scene-123-20241008-run1"
|
| 457 |
+
}'
|
| 458 |
+
```
|
| 459 |
+
|
| 460 |
+
Response (`202 Accepted` on first submission):
|
| 461 |
+
|
| 462 |
+
```json
|
| 463 |
+
{
|
| 464 |
+
"job_id": "d8f8a3fc-3aed-441c-ac78-2b953a9229bf",
|
| 465 |
+
"job_type": "pose_pointmap",
|
| 466 |
+
"scene_id": "scene-123",
|
| 467 |
+
"status": "queued",
|
| 468 |
+
"created_at": "2024-10-08T14:12:05.417Z",
|
| 469 |
+
"payload": {
|
| 470 |
+
"job_type": "pose_pointmap",
|
| 471 |
+
"scene_id": "scene-123",
|
| 472 |
+
"frames": [
|
| 473 |
+
{"url": "https://cdn.example/scene-123/frame_0000.webp"},
|
| 474 |
+
{"path": "/mnt/captures/scene-123/frame_0001.png"}
|
| 475 |
+
],
|
| 476 |
+
"session_settings": {"prediction_mode": "pointmap"}
|
| 477 |
+
}
|
| 478 |
+
}
|
| 479 |
+
```
|
| 480 |
+
|
| 481 |
+
#### `GET /stream3r/jobs/{job_id}`
|
| 482 |
+
Fetch the canonical job record (backed by Postgres). Fields include:
|
| 483 |
+
- `status`: `queued`, `started`, `progress`, `finished`, or `failed`
|
| 484 |
+
- `result`: worker-published artifact manifest (S3 URLs, local paths)
|
| 485 |
+
- `error`: error string when status is `failed`
|
| 486 |
+
- timestamps (`created_at`, `started_at`, `completed_at`)
|
| 487 |
+
|
| 488 |
+
Typical successful response:
|
| 489 |
+
|
| 490 |
+
```json
|
| 491 |
+
{
|
| 492 |
+
"job_id": "d8f8a3fc-3aed-441c-ac78-2b953a9229bf",
|
| 493 |
+
"job_type": "model_build",
|
| 494 |
+
"scene_id": "scene-123",
|
| 495 |
+
"status": "finished",
|
| 496 |
+
"created_at": "2024-10-08T14:12:05.417Z",
|
| 497 |
+
"started_at": "2024-10-08T14:12:12.998Z",
|
| 498 |
+
"completed_at": "2024-10-08T14:24:39.221Z",
|
| 499 |
+
"result": {
|
| 500 |
+
"model_dir": "s3://bucket/scene-123/stream3r/models/20241008",
|
| 501 |
+
"summary_url": "s3://bucket/scene-123/stream3r/models/20241008/summary.json"
|
| 502 |
+
},
|
| 503 |
+
"error": null,
|
| 504 |
+
"client_request_id": "scene-123-20241008-run1"
|
| 505 |
+
}
|
| 506 |
+
```
|
| 507 |
+
|
| 508 |
+
#### `GET /stream3r/jobs`
|
| 509 |
+
List jobs with optional filters:
|
| 510 |
+
- `scene_id`
|
| 511 |
+
- `job_type`
|
| 512 |
+
- `status`
|
| 513 |
+
- `limit` (1–200, default 50)
|
| 514 |
+
- `offset` (default 0)
|
| 515 |
+
|
| 516 |
+
Returns `{ "jobs": [...], "limit": 50, "offset": 0 }` with the same schema as the single-job response.
|
| 517 |
+
|
| 518 |
+
#### `GET /stream3r/jobs/{job_id}/events`
|
| 519 |
+
Server-Sent Events feed backed by Redis Streams. Use it for near-real-time updates in browser dashboards.
|
| 520 |
+
|
| 521 |
+
```bash
|
| 522 |
+
curl --no-buffer \
|
| 523 |
+
-H "Accept: text/event-stream" \
|
| 524 |
+
"https://scene-graph-mgr-api.fly.dev/stream3r/jobs/d8f8a3fc-3aed-441c-ac78-2b953a9229bf/events"
|
| 525 |
+
```
|
| 526 |
+
|
| 527 |
+
Events are emitted as standard SSE payloads:
|
| 528 |
+
|
| 529 |
+
```
|
| 530 |
+
id: 1728409930123-0
|
| 531 |
+
event: progress
|
| 532 |
+
data: {"job_id":"d8f8a3fc-3aed-441c-ac78-2b953a9229bf","status":"progress","progress":65}
|
| 533 |
+
|
| 534 |
+
id: 1728409960456-0
|
| 535 |
+
event: finished
|
| 536 |
+
data: {"job_id":"d8f8a3fc-3aed-441c-ac78-2b953a9229bf","status":"finished","result_url":"s3://bucket/.../summary.json"}
|
| 537 |
+
```
|
| 538 |
+
|
| 539 |
+
Reconnect with the last `id` to resume (`?last_id=<redis-stream-id>`). When the worker encounters an error the stream emits `event: failed` with an `error` field.
|
| 540 |
+
|
| 541 |
+
> **Implementation note:** the current worker stub still returns `status="failed"` with `error="stream3r worker not implemented"` until the GPU-backed handler ships. Downstream clients should surface the error text to operators and may retry later.
|
| 542 |
+
|
| 543 |
+
#### `GET /stream3r/models/{scene_id}/presign`
|
| 544 |
+
Fetch a presigned download URL for the most recent `model_build` job's `scene.glb`. Optional query params:
|
| 545 |
+
- `job_id` — force a specific job (must belong to the scene).
|
| 546 |
+
- `expires` — TTL in seconds (default 900, range 60–86400).
|
| 547 |
+
|
| 548 |
+
```bash
|
| 549 |
+
curl "https://scene-graph-mgr-api.fly.dev/stream3r/models/scene-123/presign?expires=600"
|
| 550 |
+
```
|
| 551 |
+
|
| 552 |
+
Response:
|
| 553 |
+
|
| 554 |
+
```json
|
| 555 |
+
{
|
| 556 |
+
"scene_id": "scene-123",
|
| 557 |
+
"job_id": "d8f8a3fc-3aed-441c-ac78-2b953a9229bf",
|
| 558 |
+
"key": "scenes/scene-123/stream3r/models/20241008/scene.glb",
|
| 559 |
+
"url": "https://s3.amazonaws.com/...signature...",
|
| 560 |
+
"expires_in": 600
|
| 561 |
+
}
|
| 562 |
+
```
|
| 563 |
+
|
| 564 |
+
Returns `404` if no successful model build exists or the job did not publish a GLB artifact.
|
| 565 |
+
|
| 566 |
+
### Jobs & Internal Ops
|
| 567 |
+
|
| 568 |
+
#### `GET /jobs/{job_id}`
|
| 569 |
+
Inspect RQ job status. Returns timestamps, result payload (if finished), and truncated stack trace when failed.
|
| 570 |
+
|
| 571 |
+
#### `POST /internal/commit-version`
|
| 572 |
+
Worker-only commit. Body:
|
| 573 |
+
```json
|
| 574 |
+
{
|
| 575 |
+
"scene_location_id": "scene-123",
|
| 576 |
+
"scene_graph": {...},
|
| 577 |
+
"base_version": 1728400000000,
|
| 578 |
+
"meta": {"source": "worker"}
|
| 579 |
+
}
|
| 580 |
+
```
|
| 581 |
+
Requires `x-internal-secret` header when enabled. Creates new version, broadcasts `scene.update`, and publishes on Redis pub/sub.
|
| 582 |
+
|
| 583 |
+
### WebSocket Stream
|
| 584 |
+
|
| 585 |
+
#### `WS /ws?channel=all|{scene_id}`
|
| 586 |
+
Receives JSON events when scenes change (`scene.create`, `scene.put`, `scene.patch`, `scene.update`). Use to refresh UI state in real time.
|
| 587 |
+
|
| 588 |
+
---
|
| 589 |
+
|
| 590 |
+
## Notes & Best Practices
|
| 591 |
+
|
| 592 |
+
- All timestamps are UTC ISO strings.
|
| 593 |
+
- Scene writes (`PUT`, `PATCH`, `create`, `commit-version`) broadcast on WebSocket and publish to Redis channel `scene_events`.
|
| 594 |
+
- Object storage paths follow `scenes/{scene_id}/images/...`; presign endpoint enforces this prefix.
|
| 595 |
+
- Scene graph payloads no longer embed instance blobs; use the `/instances` endpoint for that data.
|
| 596 |
+
- Postgres storage is required; filesystem fallbacks have been removed from the API/worker flows.
|
| 597 |
+
- Queue jobs default to 15-minute timeout (image) or 30-minute (batch). Track job progress via `/jobs/{job_id}` or Redis CLI.
|
| 598 |
+
- Improver run logs are persisted to Postgres (if configured) and mirrored to JSONL under `SCENE_RUN_LOG_DIR`.
|
notes.md
CHANGED
|
@@ -2,3 +2,5 @@
|
|
| 2 |
|
| 3 |
**Manually Clear the GPU Lock from REDIS:
|
| 4 |
redis-cli -u "$REDIS_URL" DEL gpu:lock
|
|
|
|
|
|
|
|
|
| 2 |
|
| 3 |
**Manually Clear the GPU Lock from REDIS:
|
| 4 |
redis-cli -u "$REDIS_URL" DEL gpu:lock
|
| 5 |
+
|
| 6 |
+
export INTERNAL_NOTIFY_SECRET='82d4acd547e449fe';
|
stream3r/utils/__pycache__/visual_utils.cpython-311.pyc
CHANGED
|
Binary files a/stream3r/utils/__pycache__/visual_utils.cpython-311.pyc and b/stream3r/utils/__pycache__/visual_utils.cpython-311.pyc differ
|
|
|
stream3r/utils/visual_utils.py
CHANGED
|
@@ -327,6 +327,7 @@ def predictions_to_glb(
|
|
| 327 |
reinflate_seed: int | None = None,
|
| 328 |
ceiling_percentile: float | None = None,
|
| 329 |
ceiling_margin: float = 0.05,
|
|
|
|
| 330 |
) -> trimesh.Scene:
|
| 331 |
"""
|
| 332 |
Converts predictions to a 3D scene represented as a GLB file.
|
|
@@ -364,6 +365,7 @@ def predictions_to_glb(
|
|
| 364 |
reinflate_seed (Optional[int]): RNG seed for deterministic reinflation.
|
| 365 |
ceiling_percentile (Optional[float]): Remove points above this Z percentile (0-100).
|
| 366 |
ceiling_margin (float): Margin subtracted from percentile cutoff (meters).
|
|
|
|
| 367 |
|
| 368 |
Returns:
|
| 369 |
trimesh.Scene: Processed 3D scene containing point cloud and cameras
|
|
@@ -544,6 +546,20 @@ def predictions_to_glb(
|
|
| 544 |
colors_rgb = colors_rgb[keep_mask]
|
| 545 |
conf_used = conf_used[keep_mask]
|
| 546 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 547 |
if effective_voxel_size is not None and voxel_after_conf and vertices_3d.size:
|
| 548 |
before_count = vertices_3d.shape[0]
|
| 549 |
vertices_3d, colors_rgb, conf_used = voxel_reduce(
|
|
|
|
| 327 |
reinflate_seed: int | None = None,
|
| 328 |
ceiling_percentile: float | None = None,
|
| 329 |
ceiling_margin: float = 0.05,
|
| 330 |
+
ceiling_z_max: float | None = None,
|
| 331 |
) -> trimesh.Scene:
|
| 332 |
"""
|
| 333 |
Converts predictions to a 3D scene represented as a GLB file.
|
|
|
|
| 365 |
reinflate_seed (Optional[int]): RNG seed for deterministic reinflation.
|
| 366 |
ceiling_percentile (Optional[float]): Remove points above this Z percentile (0-100).
|
| 367 |
ceiling_margin (float): Margin subtracted from percentile cutoff (meters).
|
| 368 |
+
ceiling_z_max (Optional[float]): Remove points with Z >= this absolute height (meters).
|
| 369 |
|
| 370 |
Returns:
|
| 371 |
trimesh.Scene: Processed 3D scene containing point cloud and cameras
|
|
|
|
| 546 |
colors_rgb = colors_rgb[keep_mask]
|
| 547 |
conf_used = conf_used[keep_mask]
|
| 548 |
|
| 549 |
+
if ceiling_z_max is not None and vertices_3d.size:
|
| 550 |
+
try:
|
| 551 |
+
z_limit = float(ceiling_z_max)
|
| 552 |
+
except (TypeError, ValueError):
|
| 553 |
+
z_limit = None
|
| 554 |
+
if z_limit is not None:
|
| 555 |
+
keep_mask = vertices_3d[:, 2] < z_limit
|
| 556 |
+
if not np.any(keep_mask):
|
| 557 |
+
keep_mask = vertices_3d[:, 2] <= z_limit
|
| 558 |
+
if np.any(keep_mask) and np.count_nonzero(keep_mask) < vertices_3d.shape[0]:
|
| 559 |
+
vertices_3d = vertices_3d[keep_mask]
|
| 560 |
+
colors_rgb = colors_rgb[keep_mask]
|
| 561 |
+
conf_used = conf_used[keep_mask]
|
| 562 |
+
|
| 563 |
if effective_voxel_size is not None and voxel_after_conf and vertices_3d.size:
|
| 564 |
before_count = vertices_3d.shape[0]
|
| 565 |
vertices_3d, colors_rgb, conf_used = voxel_reduce(
|
stream3r/worker/__init__.py
CHANGED
|
@@ -8,9 +8,10 @@ _settings = WorkerSettings.from_env()
|
|
| 8 |
if _settings.default_job_timeout and _settings.default_job_timeout > 0:
|
| 9 |
Queue.DEFAULT_TIMEOUT = _settings.default_job_timeout
|
| 10 |
|
| 11 |
-
from .tasks import model_build_job, pose_pointmap_job # noqa: E402
|
| 12 |
|
| 13 |
__all__ = [
|
| 14 |
"pose_pointmap_job",
|
| 15 |
"model_build_job",
|
|
|
|
| 16 |
]
|
|
|
|
| 8 |
if _settings.default_job_timeout and _settings.default_job_timeout > 0:
|
| 9 |
Queue.DEFAULT_TIMEOUT = _settings.default_job_timeout
|
| 10 |
|
| 11 |
+
from .tasks import keyframe_selection_job, model_build_job, pose_pointmap_job # noqa: E402
|
| 12 |
|
| 13 |
__all__ = [
|
| 14 |
"pose_pointmap_job",
|
| 15 |
"model_build_job",
|
| 16 |
+
"keyframe_selection_job",
|
| 17 |
]
|
stream3r/worker/config.py
CHANGED
|
@@ -68,6 +68,7 @@ class WorkerSettings:
|
|
| 68 |
|
| 69 |
pose_queue: str = "pose_pointmap"
|
| 70 |
model_queue: str = "model_build"
|
|
|
|
| 71 |
|
| 72 |
gpu_lock_key: str = "gpu:lock"
|
| 73 |
gpu_lock_timeout: int = 3600
|
|
@@ -113,6 +114,7 @@ class WorkerSettings:
|
|
| 113 |
|
| 114 |
scene_media_api_base_url: str | None = None
|
| 115 |
scene_media_api_token: str | None = None
|
|
|
|
| 116 |
scene_media_page_size: int = 200
|
| 117 |
stream_window_size: int = 14
|
| 118 |
max_frames_per_job: int = 0
|
|
@@ -128,7 +130,10 @@ class WorkerSettings:
|
|
| 128 |
keyframe_coverage_voxel_size: float = 0.05
|
| 129 |
keyframe_coverage_max_points: int = 5000
|
| 130 |
keyframe_min_gain_ratio: float = 0.01
|
| 131 |
-
keyframe_full_mode_max_frames: int =
|
|
|
|
|
|
|
|
|
|
| 132 |
|
| 133 |
@classmethod
|
| 134 |
def from_env(cls) -> "WorkerSettings":
|
|
@@ -143,6 +148,7 @@ class WorkerSettings:
|
|
| 143 |
),
|
| 144 |
"pose_queue": os.getenv("STREAM3R_QUEUE_POSE", base.pose_queue),
|
| 145 |
"model_queue": os.getenv("STREAM3R_QUEUE_MODEL", base.model_queue),
|
|
|
|
| 146 |
"gpu_lock_key": os.getenv("STREAM3R_GPU_LOCK_KEY", base.gpu_lock_key),
|
| 147 |
"gpu_lock_timeout": _env_int("STREAM3R_GPU_LOCK_TIMEOUT", base.gpu_lock_timeout),
|
| 148 |
"gpu_lock_blocking_timeout": _env_int(
|
|
@@ -212,6 +218,13 @@ class WorkerSettings:
|
|
| 212 |
default=base.scene_media_api_token,
|
| 213 |
)
|
| 214 |
or None,
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 215 |
"scene_media_page_size": _env_int(
|
| 216 |
"STREAM3R_MEDIA_PAGE_SIZE", base.scene_media_page_size
|
| 217 |
),
|
|
@@ -260,6 +273,15 @@ class WorkerSettings:
|
|
| 260 |
"keyframe_full_mode_max_frames": _env_int(
|
| 261 |
"STREAM3R_KEYFRAME_FULL_MAX_FRAMES", base.keyframe_full_mode_max_frames
|
| 262 |
),
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 263 |
}
|
| 264 |
|
| 265 |
return cls(**kwargs)
|
|
|
|
| 68 |
|
| 69 |
pose_queue: str = "pose_pointmap"
|
| 70 |
model_queue: str = "model_build"
|
| 71 |
+
keyframe_queue: str = "keyframe_selection"
|
| 72 |
|
| 73 |
gpu_lock_key: str = "gpu:lock"
|
| 74 |
gpu_lock_timeout: int = 3600
|
|
|
|
| 114 |
|
| 115 |
scene_media_api_base_url: str | None = None
|
| 116 |
scene_media_api_token: str | None = None
|
| 117 |
+
scene_media_api_secret: str | None = None
|
| 118 |
scene_media_page_size: int = 200
|
| 119 |
stream_window_size: int = 14
|
| 120 |
max_frames_per_job: int = 0
|
|
|
|
| 130 |
keyframe_coverage_voxel_size: float = 0.05
|
| 131 |
keyframe_coverage_max_points: int = 5000
|
| 132 |
keyframe_min_gain_ratio: float = 0.01
|
| 133 |
+
keyframe_full_mode_max_frames: int = 24
|
| 134 |
+
keyframe_extract_fps: float = 6.0
|
| 135 |
+
keyframe_extract_max_frames: int = 1200
|
| 136 |
+
keyframe_upload_dir: str = "images/keyframes"
|
| 137 |
|
| 138 |
@classmethod
|
| 139 |
def from_env(cls) -> "WorkerSettings":
|
|
|
|
| 148 |
),
|
| 149 |
"pose_queue": os.getenv("STREAM3R_QUEUE_POSE", base.pose_queue),
|
| 150 |
"model_queue": os.getenv("STREAM3R_QUEUE_MODEL", base.model_queue),
|
| 151 |
+
"keyframe_queue": os.getenv("STREAM3R_QUEUE_KEYFRAME", base.keyframe_queue),
|
| 152 |
"gpu_lock_key": os.getenv("STREAM3R_GPU_LOCK_KEY", base.gpu_lock_key),
|
| 153 |
"gpu_lock_timeout": _env_int("STREAM3R_GPU_LOCK_TIMEOUT", base.gpu_lock_timeout),
|
| 154 |
"gpu_lock_blocking_timeout": _env_int(
|
|
|
|
| 218 |
default=base.scene_media_api_token,
|
| 219 |
)
|
| 220 |
or None,
|
| 221 |
+
"scene_media_api_secret": _env_value(
|
| 222 |
+
"STREAM3R_MEDIA_API_SECRET",
|
| 223 |
+
"MEDIA_API_SECRET",
|
| 224 |
+
"INTERNAL_NOTIFY_SECRET",
|
| 225 |
+
default=base.scene_media_api_secret,
|
| 226 |
+
)
|
| 227 |
+
or None,
|
| 228 |
"scene_media_page_size": _env_int(
|
| 229 |
"STREAM3R_MEDIA_PAGE_SIZE", base.scene_media_page_size
|
| 230 |
),
|
|
|
|
| 273 |
"keyframe_full_mode_max_frames": _env_int(
|
| 274 |
"STREAM3R_KEYFRAME_FULL_MAX_FRAMES", base.keyframe_full_mode_max_frames
|
| 275 |
),
|
| 276 |
+
"keyframe_extract_fps": float(
|
| 277 |
+
os.getenv("STREAM3R_KEYFRAME_EXTRACT_FPS", base.keyframe_extract_fps)
|
| 278 |
+
),
|
| 279 |
+
"keyframe_extract_max_frames": _env_int(
|
| 280 |
+
"STREAM3R_KEYFRAME_EXTRACT_MAX_FRAMES", base.keyframe_extract_max_frames
|
| 281 |
+
),
|
| 282 |
+
"keyframe_upload_dir": os.getenv(
|
| 283 |
+
"STREAM3R_KEYFRAME_UPLOAD_DIR", base.keyframe_upload_dir
|
| 284 |
+
),
|
| 285 |
}
|
| 286 |
|
| 287 |
return cls(**kwargs)
|
stream3r/worker/keyframes.py
ADDED
|
@@ -0,0 +1,408 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Key frame selection utilities."""
|
| 2 |
+
|
| 3 |
+
from __future__ import annotations
|
| 4 |
+
|
| 5 |
+
import logging
|
| 6 |
+
from dataclasses import dataclass, field
|
| 7 |
+
from datetime import datetime
|
| 8 |
+
from pathlib import Path
|
| 9 |
+
from typing import Any, Iterable, Mapping
|
| 10 |
+
|
| 11 |
+
import cv2
|
| 12 |
+
import numpy as np
|
| 13 |
+
|
| 14 |
+
from .config import WorkerSettings
|
| 15 |
+
from .pipeline import run_stream3r_inference
|
| 16 |
+
from .runtime import WorkerRuntime
|
| 17 |
+
|
| 18 |
+
|
| 19 |
+
logger = logging.getLogger(__name__)
|
| 20 |
+
|
| 21 |
+
|
| 22 |
+
@dataclass(slots=True)
|
| 23 |
+
class FrameRecord:
|
| 24 |
+
index: int
|
| 25 |
+
frame_id: str
|
| 26 |
+
path: Path
|
| 27 |
+
source: str | None = None
|
| 28 |
+
timestamp: str | None = None
|
| 29 |
+
metadata: dict[str, Any] = field(default_factory=dict)
|
| 30 |
+
|
| 31 |
+
|
| 32 |
+
@dataclass(slots=True)
|
| 33 |
+
class KeyframeSelectionResult:
|
| 34 |
+
indices: list[int]
|
| 35 |
+
diagnostics: list[dict[str, Any]]
|
| 36 |
+
top_k: int
|
| 37 |
+
|
| 38 |
+
|
| 39 |
+
def pose_confidence(predictions: Mapping[str, np.ndarray]) -> np.ndarray | None:
|
| 40 |
+
if "world_points_conf" in predictions:
|
| 41 |
+
return np.asarray(predictions["world_points_conf"], dtype=np.float32)
|
| 42 |
+
if "depth_conf" in predictions:
|
| 43 |
+
return np.asarray(predictions["depth_conf"], dtype=np.float32)
|
| 44 |
+
return None
|
| 45 |
+
|
| 46 |
+
|
| 47 |
+
def _camera_poses(extrinsic: np.ndarray) -> tuple[np.ndarray, np.ndarray]:
|
| 48 |
+
matrices = np.asarray(extrinsic, dtype=np.float64)
|
| 49 |
+
if matrices.ndim != 3 or matrices.shape[1:] != (3, 4):
|
| 50 |
+
raise ValueError("Extrinsic array must have shape (N, 3, 4)")
|
| 51 |
+
count = matrices.shape[0]
|
| 52 |
+
rotations = np.empty((count, 3, 3), dtype=np.float64)
|
| 53 |
+
translations = np.empty((count, 3), dtype=np.float64)
|
| 54 |
+
for idx in range(count):
|
| 55 |
+
mat = np.eye(4, dtype=np.float64)
|
| 56 |
+
mat[:3, :4] = matrices[idx]
|
| 57 |
+
cam_to_world = np.linalg.inv(mat)
|
| 58 |
+
rotations[idx] = cam_to_world[:3, :3]
|
| 59 |
+
translations[idx] = cam_to_world[:3, 3]
|
| 60 |
+
return rotations, translations
|
| 61 |
+
|
| 62 |
+
|
| 63 |
+
def _compute_motion_deltas(rotations: np.ndarray, translations: np.ndarray, rot_weight: float) -> np.ndarray:
|
| 64 |
+
count = rotations.shape[0]
|
| 65 |
+
deltas = np.zeros(count, dtype=np.float64)
|
| 66 |
+
if count <= 1:
|
| 67 |
+
return deltas
|
| 68 |
+
for idx in range(1, count):
|
| 69 |
+
delta_t = np.linalg.norm(translations[idx] - translations[idx - 1])
|
| 70 |
+
rel = rotations[idx - 1].T @ rotations[idx]
|
| 71 |
+
trace = np.clip((np.trace(rel) - 1.0) / 2.0, -1.0, 1.0)
|
| 72 |
+
delta_r = float(np.arccos(trace))
|
| 73 |
+
deltas[idx] = delta_t + rot_weight * delta_r
|
| 74 |
+
return deltas
|
| 75 |
+
|
| 76 |
+
|
| 77 |
+
def _hash_quantized_voxels(coords: np.ndarray) -> np.ndarray:
|
| 78 |
+
coords = coords.astype(np.int64, copy=False)
|
| 79 |
+
primes = np.array([73856093, 19349663, 83492791], dtype=np.int64)
|
| 80 |
+
return coords @ primes
|
| 81 |
+
|
| 82 |
+
|
| 83 |
+
def _frame_voxel_sets(
|
| 84 |
+
world_points: np.ndarray,
|
| 85 |
+
confidence: np.ndarray,
|
| 86 |
+
*,
|
| 87 |
+
threshold: float,
|
| 88 |
+
voxel_size: float,
|
| 89 |
+
max_points: int,
|
| 90 |
+
) -> tuple[list[set[int]], int]:
|
| 91 |
+
rng = np.random.default_rng(42)
|
| 92 |
+
frames = world_points.shape[0]
|
| 93 |
+
voxel_sets: list[set[int]] = []
|
| 94 |
+
global_union: set[int] = set()
|
| 95 |
+
if voxel_size <= 0.0:
|
| 96 |
+
return [set() for _ in range(frames)], 0
|
| 97 |
+
for idx in range(frames):
|
| 98 |
+
conf_frame = confidence[idx]
|
| 99 |
+
mask = conf_frame >= threshold
|
| 100 |
+
if not np.any(mask):
|
| 101 |
+
voxel_sets.append(set())
|
| 102 |
+
continue
|
| 103 |
+
points = world_points[idx][mask]
|
| 104 |
+
if points.shape[0] > max_points:
|
| 105 |
+
sample_idx = rng.choice(points.shape[0], max_points, replace=False)
|
| 106 |
+
points = points[sample_idx]
|
| 107 |
+
quantized = np.floor(points / voxel_size).astype(np.int64, copy=False)
|
| 108 |
+
hashes = np.unique(_hash_quantized_voxels(quantized))
|
| 109 |
+
voxel_set = set(int(v) for v in hashes.tolist())
|
| 110 |
+
voxel_sets.append(voxel_set)
|
| 111 |
+
global_union.update(voxel_set)
|
| 112 |
+
return voxel_sets, len(global_union)
|
| 113 |
+
|
| 114 |
+
|
| 115 |
+
def _select_motion_indices(
|
| 116 |
+
motion_deltas: np.ndarray,
|
| 117 |
+
*,
|
| 118 |
+
threshold: float,
|
| 119 |
+
min_gap: int,
|
| 120 |
+
max_gap: int,
|
| 121 |
+
) -> tuple[list[int], dict[int, dict[str, float]]]:
|
| 122 |
+
total_frames = motion_deltas.shape[0]
|
| 123 |
+
if total_frames == 0:
|
| 124 |
+
return [], {}
|
| 125 |
+
selected = [0]
|
| 126 |
+
diagnostics: dict[int, dict[str, float]] = {0: {"motion_delta": 0.0, "cum_motion": 0.0}}
|
| 127 |
+
cumulative = 0.0
|
| 128 |
+
gap = 0
|
| 129 |
+
for idx in range(1, total_frames):
|
| 130 |
+
delta = float(motion_deltas[idx])
|
| 131 |
+
cumulative += delta
|
| 132 |
+
gap += 1
|
| 133 |
+
if gap < max(1, min_gap):
|
| 134 |
+
continue
|
| 135 |
+
should_select = cumulative >= threshold
|
| 136 |
+
if max_gap > 0 and gap >= max_gap:
|
| 137 |
+
should_select = True
|
| 138 |
+
if should_select:
|
| 139 |
+
selected.append(idx)
|
| 140 |
+
diagnostics[idx] = {"motion_delta": delta, "cum_motion": cumulative}
|
| 141 |
+
cumulative = 0.0
|
| 142 |
+
gap = 0
|
| 143 |
+
if selected[-1] != total_frames - 1:
|
| 144 |
+
selected.append(total_frames - 1)
|
| 145 |
+
diagnostics.setdefault(total_frames - 1, {"motion_delta": float(motion_deltas[-1]), "cum_motion": cumulative})
|
| 146 |
+
return selected, diagnostics
|
| 147 |
+
|
| 148 |
+
|
| 149 |
+
def select_keyframes_motion_coverage(
|
| 150 |
+
frame_records: list[FrameRecord],
|
| 151 |
+
predictions: Mapping[str, np.ndarray],
|
| 152 |
+
settings: WorkerSettings,
|
| 153 |
+
requested_top_k: int,
|
| 154 |
+
) -> KeyframeSelectionResult | None:
|
| 155 |
+
extrinsic = np.asarray(predictions.get("extrinsic"))
|
| 156 |
+
if extrinsic.size == 0:
|
| 157 |
+
return None
|
| 158 |
+
rotations, translations = _camera_poses(extrinsic)
|
| 159 |
+
motion_deltas = _compute_motion_deltas(rotations, translations, settings.keyframe_rotation_weight)
|
| 160 |
+
motion_indices, motion_diag = _select_motion_indices(
|
| 161 |
+
motion_deltas,
|
| 162 |
+
threshold=settings.keyframe_motion_threshold,
|
| 163 |
+
min_gap=max(1, settings.keyframe_min_gap_frames),
|
| 164 |
+
max_gap=max(0, settings.keyframe_max_gap_frames),
|
| 165 |
+
)
|
| 166 |
+
|
| 167 |
+
total_frames = len(frame_records)
|
| 168 |
+
confidence = pose_confidence(predictions)
|
| 169 |
+
world_points = predictions.get("world_points")
|
| 170 |
+
if world_points is None:
|
| 171 |
+
world_points = predictions.get("world_points_from_depth")
|
| 172 |
+
|
| 173 |
+
voxel_sets: list[set[int]] = [set() for _ in range(total_frames)]
|
| 174 |
+
total_voxels = 0
|
| 175 |
+
mean_conf = np.zeros(total_frames, dtype=np.float32)
|
| 176 |
+
if confidence is not None:
|
| 177 |
+
mean_conf = confidence.reshape(confidence.shape[0], -1).mean(axis=1)
|
| 178 |
+
|
| 179 |
+
if confidence is not None and world_points is not None:
|
| 180 |
+
voxel_sets, total_voxels = _frame_voxel_sets(
|
| 181 |
+
np.asarray(world_points),
|
| 182 |
+
np.asarray(confidence),
|
| 183 |
+
threshold=settings.keyframe_coverage_confidence,
|
| 184 |
+
voxel_size=settings.keyframe_coverage_voxel_size,
|
| 185 |
+
max_points=max(1000, settings.keyframe_coverage_max_points),
|
| 186 |
+
)
|
| 187 |
+
|
| 188 |
+
total_voxels = max(total_voxels, 1)
|
| 189 |
+
top_k = requested_top_k if requested_top_k > 0 else settings.keyframe_default_top_k
|
| 190 |
+
top_k = max(min(top_k, total_frames), len(motion_indices))
|
| 191 |
+
|
| 192 |
+
selected_set: set[int] = set(motion_indices)
|
| 193 |
+
diagnostics: dict[int, dict[str, Any]] = {}
|
| 194 |
+
covered: set[int] = set()
|
| 195 |
+
|
| 196 |
+
for idx in motion_indices:
|
| 197 |
+
gain_count = len(voxel_sets[idx] - covered) if voxel_sets[idx] else 0
|
| 198 |
+
gain_ratio = gain_count / total_voxels
|
| 199 |
+
covered.update(voxel_sets[idx])
|
| 200 |
+
diagnostics[idx] = {
|
| 201 |
+
"frame_id": frame_records[idx].frame_id,
|
| 202 |
+
"frame_index": frame_records[idx].index,
|
| 203 |
+
"reason": "motion",
|
| 204 |
+
"motion_delta": float(motion_deltas[idx]),
|
| 205 |
+
"cum_motion": float(motion_diag.get(idx, {}).get("cum_motion", 0.0)),
|
| 206 |
+
"coverage_gain_ratio": float(gain_ratio),
|
| 207 |
+
"coverage_gain_count": int(gain_count),
|
| 208 |
+
"mean_confidence": float(mean_conf[idx]) if confidence is not None else None,
|
| 209 |
+
}
|
| 210 |
+
|
| 211 |
+
if len(selected_set) < top_k and total_voxels > 0:
|
| 212 |
+
min_gain_ratio = settings.keyframe_min_gain_ratio
|
| 213 |
+
remaining = [i for i in range(total_frames) if i not in selected_set and voxel_sets[i]]
|
| 214 |
+
while remaining and len(selected_set) < top_k:
|
| 215 |
+
best_idx = -1
|
| 216 |
+
best_gain = -1
|
| 217 |
+
best_ratio = -1.0
|
| 218 |
+
for idx in remaining:
|
| 219 |
+
gain = len(voxel_sets[idx] - covered)
|
| 220 |
+
if gain <= 0:
|
| 221 |
+
continue
|
| 222 |
+
ratio = gain / total_voxels
|
| 223 |
+
if ratio > best_ratio or (np.isclose(ratio, best_ratio) and gain > best_gain):
|
| 224 |
+
best_idx = idx
|
| 225 |
+
best_gain = gain
|
| 226 |
+
best_ratio = ratio
|
| 227 |
+
if best_idx == -1 or best_ratio < min_gain_ratio:
|
| 228 |
+
break
|
| 229 |
+
selected_set.add(best_idx)
|
| 230 |
+
covered.update(voxel_sets[best_idx])
|
| 231 |
+
diagnostics[best_idx] = {
|
| 232 |
+
"frame_id": frame_records[best_idx].frame_id,
|
| 233 |
+
"frame_index": frame_records[best_idx].index,
|
| 234 |
+
"reason": "coverage",
|
| 235 |
+
"motion_delta": float(motion_deltas[best_idx]),
|
| 236 |
+
"cum_motion": float(motion_diag.get(best_idx, {}).get("cum_motion", 0.0)),
|
| 237 |
+
"coverage_gain_ratio": float(best_ratio),
|
| 238 |
+
"coverage_gain_count": int(best_gain),
|
| 239 |
+
"mean_confidence": float(mean_conf[best_idx]) if confidence is not None else None,
|
| 240 |
+
}
|
| 241 |
+
remaining.remove(best_idx)
|
| 242 |
+
|
| 243 |
+
if requested_top_k > 0 and len(selected_set) > requested_top_k:
|
| 244 |
+
coverage_candidates = [idx for idx in selected_set if diagnostics[idx]["reason"] == "coverage"]
|
| 245 |
+
coverage_candidates.sort(key=lambda idx: diagnostics[idx].get("coverage_gain_ratio", 0.0))
|
| 246 |
+
while len(selected_set) > requested_top_k and coverage_candidates:
|
| 247 |
+
drop_idx = coverage_candidates.pop(0)
|
| 248 |
+
selected_set.remove(drop_idx)
|
| 249 |
+
diagnostics.pop(drop_idx, None)
|
| 250 |
+
|
| 251 |
+
final_indices = sorted(selected_set)
|
| 252 |
+
final_diags = [diagnostics[idx] for idx in final_indices]
|
| 253 |
+
return KeyframeSelectionResult(indices=final_indices, diagnostics=final_diags, top_k=len(final_indices))
|
| 254 |
+
|
| 255 |
+
|
| 256 |
+
def run_keyframe_prepass(
|
| 257 |
+
*,
|
| 258 |
+
runtime: WorkerRuntime,
|
| 259 |
+
payload: Mapping[str, Any],
|
| 260 |
+
frame_records: list[FrameRecord],
|
| 261 |
+
mode: str,
|
| 262 |
+
streaming: bool,
|
| 263 |
+
window_size: int | None,
|
| 264 |
+
) -> KeyframeSelectionResult | None:
|
| 265 |
+
if len(frame_records) <= 1:
|
| 266 |
+
return None
|
| 267 |
+
|
| 268 |
+
settings = runtime.settings
|
| 269 |
+
top_k_payload = int(payload.get("prepass_top_k") or payload.get("top_k_frames") or payload.get("top_k") or 0)
|
| 270 |
+
|
| 271 |
+
try:
|
| 272 |
+
inference = run_stream3r_inference(
|
| 273 |
+
runtime=runtime,
|
| 274 |
+
image_paths=[record.path for record in frame_records],
|
| 275 |
+
mode=mode,
|
| 276 |
+
streaming=streaming,
|
| 277 |
+
cache_output_path=None,
|
| 278 |
+
progress_cb=None,
|
| 279 |
+
window_size=window_size if streaming and mode == "window" else None,
|
| 280 |
+
)
|
| 281 |
+
except Exception:
|
| 282 |
+
logger.exception("Keyframe pre-pass inference failed")
|
| 283 |
+
return None
|
| 284 |
+
|
| 285 |
+
try:
|
| 286 |
+
return select_keyframes_motion_coverage(
|
| 287 |
+
frame_records,
|
| 288 |
+
inference.predictions,
|
| 289 |
+
settings,
|
| 290 |
+
requested_top_k=top_k_payload,
|
| 291 |
+
)
|
| 292 |
+
finally:
|
| 293 |
+
del inference
|
| 294 |
+
|
| 295 |
+
|
| 296 |
+
def extract_video_frames(
|
| 297 |
+
video_path: Path,
|
| 298 |
+
output_dir: Path,
|
| 299 |
+
*,
|
| 300 |
+
target_fps: float | None = None,
|
| 301 |
+
max_frames: int | None = None,
|
| 302 |
+
) -> tuple[list[FrameRecord], float]:
|
| 303 |
+
if not video_path.exists():
|
| 304 |
+
raise FileNotFoundError(f"Video file not found: {video_path}")
|
| 305 |
+
|
| 306 |
+
output_dir.mkdir(parents=True, exist_ok=True)
|
| 307 |
+
cap = cv2.VideoCapture(str(video_path))
|
| 308 |
+
if not cap.isOpened():
|
| 309 |
+
raise RuntimeError(f"Failed to open video: {video_path}")
|
| 310 |
+
|
| 311 |
+
native_fps = cap.get(cv2.CAP_PROP_FPS)
|
| 312 |
+
if not native_fps or native_fps <= 0:
|
| 313 |
+
native_fps = 30.0
|
| 314 |
+
frame_interval = 1
|
| 315 |
+
if target_fps and target_fps > 0:
|
| 316 |
+
frame_interval = max(1, int(round(native_fps / target_fps)))
|
| 317 |
+
|
| 318 |
+
frame_records: list[FrameRecord] = []
|
| 319 |
+
total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT) or 0)
|
| 320 |
+
frame_idx = 0
|
| 321 |
+
extracted = 0
|
| 322 |
+
success, frame = cap.read()
|
| 323 |
+
while success:
|
| 324 |
+
if frame_idx % frame_interval == 0:
|
| 325 |
+
frame_id = f"frame_{extracted:06d}"
|
| 326 |
+
frame_path = output_dir / f"{frame_id}.jpg"
|
| 327 |
+
if not cv2.imwrite(str(frame_path), frame):
|
| 328 |
+
cap.release()
|
| 329 |
+
raise RuntimeError(f"Failed to write frame: {frame_path}")
|
| 330 |
+
timestamp_s = frame_idx / native_fps
|
| 331 |
+
frame_records.append(
|
| 332 |
+
FrameRecord(
|
| 333 |
+
index=extracted,
|
| 334 |
+
frame_id=frame_id,
|
| 335 |
+
path=frame_path,
|
| 336 |
+
metadata={"frame_number": frame_idx, "timestamp_s": timestamp_s},
|
| 337 |
+
)
|
| 338 |
+
)
|
| 339 |
+
extracted += 1
|
| 340 |
+
if max_frames and extracted >= max_frames:
|
| 341 |
+
break
|
| 342 |
+
frame_idx += 1
|
| 343 |
+
success, frame = cap.read()
|
| 344 |
+
|
| 345 |
+
cap.release()
|
| 346 |
+
if not frame_records:
|
| 347 |
+
raise RuntimeError("No frames extracted from video")
|
| 348 |
+
|
| 349 |
+
return frame_records, native_fps
|
| 350 |
+
|
| 351 |
+
|
| 352 |
+
def linear_sample_indices(total: int, desired: int) -> list[int]:
|
| 353 |
+
if desired <= 0 or total <= desired:
|
| 354 |
+
return list(range(total))
|
| 355 |
+
step = total / desired
|
| 356 |
+
return [min(total - 1, int(round(i * step))) for i in range(desired)]
|
| 357 |
+
|
| 358 |
+
|
| 359 |
+
def build_keyframe_uploads(
|
| 360 |
+
runtime: WorkerRuntime,
|
| 361 |
+
scene_id: str,
|
| 362 |
+
selected_records: Iterable[FrameRecord],
|
| 363 |
+
diagnostics: list[dict[str, Any]],
|
| 364 |
+
*,
|
| 365 |
+
subdir: str,
|
| 366 |
+
) -> tuple[list[dict[str, Any]], list[dict[str, Any]]]:
|
| 367 |
+
diag_by_index = {entry.get("frame_index"): entry for entry in diagnostics}
|
| 368 |
+
storage_entries: list[dict[str, Any]] = []
|
| 369 |
+
media_entries: list[dict[str, Any]] = []
|
| 370 |
+
|
| 371 |
+
for record in selected_records:
|
| 372 |
+
diag = diag_by_index.get(record.index, {})
|
| 373 |
+
filename = f"{record.frame_id}.jpg"
|
| 374 |
+
key = runtime.storage.build_key(scene_id, subdir, filename)
|
| 375 |
+
uri = runtime.storage.upload_file(record.path, key, content_type="image/jpeg")
|
| 376 |
+
storage_entries.append(
|
| 377 |
+
{
|
| 378 |
+
"frame_id": record.frame_id,
|
| 379 |
+
"frame_index": record.index,
|
| 380 |
+
"url": uri,
|
| 381 |
+
"storage_key": key,
|
| 382 |
+
"diagnostics": diag,
|
| 383 |
+
}
|
| 384 |
+
)
|
| 385 |
+
|
| 386 |
+
media_entries.append(
|
| 387 |
+
{
|
| 388 |
+
"media_type": "image",
|
| 389 |
+
"file": key,
|
| 390 |
+
"captured_at": _diagnostic_captured_at(record, diag),
|
| 391 |
+
}
|
| 392 |
+
)
|
| 393 |
+
|
| 394 |
+
return storage_entries, media_entries
|
| 395 |
+
|
| 396 |
+
|
| 397 |
+
def _diagnostic_captured_at(record: FrameRecord, diag: Mapping[str, Any]) -> str | None:
|
| 398 |
+
if record.timestamp:
|
| 399 |
+
return record.timestamp
|
| 400 |
+
ts = diag.get("timestamp") or record.metadata.get("timestamp")
|
| 401 |
+
if isinstance(ts, str):
|
| 402 |
+
return ts
|
| 403 |
+
if isinstance(ts, (int, float)):
|
| 404 |
+
return datetime.utcfromtimestamp(float(ts)).isoformat() + "Z"
|
| 405 |
+
timestamp_s = record.metadata.get("timestamp_s")
|
| 406 |
+
if isinstance(timestamp_s, (int, float)):
|
| 407 |
+
return datetime.utcfromtimestamp(float(timestamp_s)).isoformat() + "Z"
|
| 408 |
+
return None
|
stream3r/worker/main.py
CHANGED
|
@@ -77,7 +77,7 @@ def main() -> None:
|
|
| 77 |
if settings.default_job_timeout and settings.default_job_timeout > 0:
|
| 78 |
Queue.DEFAULT_TIMEOUT = settings.default_job_timeout
|
| 79 |
|
| 80 |
-
args = _parse_args([settings.pose_queue, settings.model_queue])
|
| 81 |
logging.basicConfig(level=getattr(logging, str(args.log_level).upper(), logging.INFO))
|
| 82 |
|
| 83 |
runtime = get_runtime()
|
|
|
|
| 77 |
if settings.default_job_timeout and settings.default_job_timeout > 0:
|
| 78 |
Queue.DEFAULT_TIMEOUT = settings.default_job_timeout
|
| 79 |
|
| 80 |
+
args = _parse_args([settings.pose_queue, settings.model_queue, settings.keyframe_queue])
|
| 81 |
logging.basicConfig(level=getattr(logging, str(args.log_level).upper(), logging.INFO))
|
| 82 |
|
| 83 |
runtime = get_runtime()
|
stream3r/worker/tasks.py
CHANGED
|
@@ -11,7 +11,6 @@ import shutil
|
|
| 11 |
import tempfile
|
| 12 |
import traceback
|
| 13 |
import uuid
|
| 14 |
-
from dataclasses import dataclass, field
|
| 15 |
from datetime import datetime, timezone
|
| 16 |
from pathlib import Path
|
| 17 |
from contextlib import nullcontext
|
|
@@ -24,6 +23,15 @@ from rq import get_current_job
|
|
| 24 |
|
| 25 |
from stream3r.utils.visual_utils import predictions_to_glb
|
| 26 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 27 |
from .pipeline import InferenceResult, run_stream3r_inference
|
| 28 |
from .runtime import WorkerRuntime, get_runtime
|
| 29 |
|
|
@@ -51,25 +59,6 @@ def _as_int(value: Any, default: int) -> int:
|
|
| 51 |
return int(value)
|
| 52 |
except (TypeError, ValueError):
|
| 53 |
return default
|
| 54 |
-
|
| 55 |
-
|
| 56 |
-
@dataclass(slots=True)
|
| 57 |
-
class FrameRecord:
|
| 58 |
-
index: int
|
| 59 |
-
frame_id: str
|
| 60 |
-
path: Path
|
| 61 |
-
source: str | None = None
|
| 62 |
-
timestamp: str | None = None
|
| 63 |
-
metadata: dict[str, Any] = field(default_factory=dict)
|
| 64 |
-
|
| 65 |
-
|
| 66 |
-
@dataclass(slots=True)
|
| 67 |
-
class KeyframeSelectionResult:
|
| 68 |
-
indices: list[int]
|
| 69 |
-
diagnostics: list[dict[str, Any]]
|
| 70 |
-
top_k: int
|
| 71 |
-
|
| 72 |
-
|
| 73 |
class ProgressTracker:
|
| 74 |
"""Aggregates frame progress to percentage updates."""
|
| 75 |
|
|
@@ -124,6 +113,53 @@ def _write_base64(content: str, destination: Path) -> None:
|
|
| 124 |
destination.write_bytes(data)
|
| 125 |
|
| 126 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 127 |
def _resolve_frame_entry(entry: Any, *, index: int, dest_dir: Path) -> FrameRecord:
|
| 128 |
metadata: dict[str, Any] = {}
|
| 129 |
timestamp = None
|
|
@@ -364,16 +400,6 @@ def _collect_frames_from_scene_media(
|
|
| 364 |
offset += request_limit
|
| 365 |
|
| 366 |
return records
|
| 367 |
-
|
| 368 |
-
|
| 369 |
-
def _pose_confidence(predictions: Mapping[str, np.ndarray]) -> np.ndarray | None:
|
| 370 |
-
if "world_points_conf" in predictions:
|
| 371 |
-
return np.asarray(predictions["world_points_conf"], dtype=np.float32)
|
| 372 |
-
if "depth_conf" in predictions:
|
| 373 |
-
return np.asarray(predictions["depth_conf"], dtype=np.float32)
|
| 374 |
-
return None
|
| 375 |
-
|
| 376 |
-
|
| 377 |
def _save_pointmaps(
|
| 378 |
*,
|
| 379 |
runtime: WorkerRuntime,
|
|
@@ -389,7 +415,7 @@ def _save_pointmaps(
|
|
| 389 |
raise RuntimeError("Predictions missing world points")
|
| 390 |
|
| 391 |
world_points = np.asarray(world_points)
|
| 392 |
-
confidence =
|
| 393 |
if confidence is None:
|
| 394 |
confidence = np.ones(world_points.shape[:-1], dtype=np.float32)
|
| 395 |
|
|
@@ -669,7 +695,7 @@ def _select_keyframes_motion_coverage(
|
|
| 669 |
max_gap=max(0, settings.keyframe_max_gap_frames),
|
| 670 |
)
|
| 671 |
total_frames = len(frame_records)
|
| 672 |
-
confidence =
|
| 673 |
world_points = predictions.get("world_points")
|
| 674 |
if world_points is None:
|
| 675 |
world_points = predictions.get("world_points_from_depth")
|
|
@@ -756,7 +782,7 @@ def _compute_selected_frames(
|
|
| 756 |
) -> list[dict[str, Any]]:
|
| 757 |
if top_k <= 0:
|
| 758 |
return []
|
| 759 |
-
confidence =
|
| 760 |
if confidence is None:
|
| 761 |
return []
|
| 762 |
scores = confidence.reshape(confidence.shape[0], -1).mean(axis=1)
|
|
@@ -831,6 +857,11 @@ def _save_scene_glb(
|
|
| 831 |
ceiling_margin_value = float(ceiling_margin_value) if ceiling_margin_value is not None else 0.05
|
| 832 |
except (TypeError, ValueError):
|
| 833 |
ceiling_margin_value = 0.05
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 834 |
|
| 835 |
scene = predictions_to_glb(
|
| 836 |
dict(predictions),
|
|
@@ -844,6 +875,7 @@ def _save_scene_glb(
|
|
| 844 |
prediction_mode=payload.get("prediction_mode", "Predicted Pointmap"),
|
| 845 |
ceiling_percentile=ceiling_percentile_value,
|
| 846 |
ceiling_margin=ceiling_margin_value,
|
|
|
|
| 847 |
)
|
| 848 |
scene.export(file_obj=str(local_file))
|
| 849 |
key = runtime.storage.build_key(
|
|
@@ -1302,6 +1334,174 @@ def model_build_job(payload: Mapping[str, Any]) -> dict[str, Any]:
|
|
| 1302 |
return _execute_job("model_build", payload, _handle_model_build)
|
| 1303 |
|
| 1304 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1305 |
def _handle_model_build(
|
| 1306 |
*,
|
| 1307 |
runtime: WorkerRuntime,
|
|
|
|
| 11 |
import tempfile
|
| 12 |
import traceback
|
| 13 |
import uuid
|
|
|
|
| 14 |
from datetime import datetime, timezone
|
| 15 |
from pathlib import Path
|
| 16 |
from contextlib import nullcontext
|
|
|
|
| 23 |
|
| 24 |
from stream3r.utils.visual_utils import predictions_to_glb
|
| 25 |
|
| 26 |
+
from .keyframes import (
|
| 27 |
+
FrameRecord,
|
| 28 |
+
KeyframeSelectionResult,
|
| 29 |
+
build_keyframe_uploads,
|
| 30 |
+
extract_video_frames,
|
| 31 |
+
linear_sample_indices,
|
| 32 |
+
pose_confidence,
|
| 33 |
+
run_keyframe_prepass,
|
| 34 |
+
)
|
| 35 |
from .pipeline import InferenceResult, run_stream3r_inference
|
| 36 |
from .runtime import WorkerRuntime, get_runtime
|
| 37 |
|
|
|
|
| 59 |
return int(value)
|
| 60 |
except (TypeError, ValueError):
|
| 61 |
return default
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 62 |
class ProgressTracker:
|
| 63 |
"""Aggregates frame progress to percentage updates."""
|
| 64 |
|
|
|
|
| 113 |
destination.write_bytes(data)
|
| 114 |
|
| 115 |
|
| 116 |
+
def _register_scene_media_entries(runtime: WorkerRuntime, scene_id: str, entries: list[dict[str, Any]]) -> None:
|
| 117 |
+
if not entries:
|
| 118 |
+
return
|
| 119 |
+
base_url = runtime.settings.scene_media_api_base_url
|
| 120 |
+
if not base_url:
|
| 121 |
+
logger.info("Scene media API base URL not configured; skipping registration for %s", scene_id)
|
| 122 |
+
return
|
| 123 |
+
|
| 124 |
+
url = f"{base_url.rstrip('/')}/scenes/{scene_id}/media"
|
| 125 |
+
headers: dict[str, str] = {"Content-Type": "application/json"}
|
| 126 |
+
token = runtime.settings.scene_media_api_token
|
| 127 |
+
if token:
|
| 128 |
+
headers["Authorization"] = f"Bearer {token}"
|
| 129 |
+
secret = runtime.settings.scene_media_api_secret
|
| 130 |
+
if secret:
|
| 131 |
+
headers["x-internal-secret"] = secret
|
| 132 |
+
|
| 133 |
+
try:
|
| 134 |
+
response = requests.post(url, json={"entries": entries}, headers=headers, timeout=30)
|
| 135 |
+
if response.status_code == 405:
|
| 136 |
+
logger.info(
|
| 137 |
+
"Scene media API does not accept POST at %s (status %s); skipping registration",
|
| 138 |
+
url,
|
| 139 |
+
response.status_code,
|
| 140 |
+
)
|
| 141 |
+
return
|
| 142 |
+
response.raise_for_status()
|
| 143 |
+
except requests.HTTPError as exc:
|
| 144 |
+
status = exc.response.status_code if exc.response is not None else None
|
| 145 |
+
if status == 422:
|
| 146 |
+
logger.warning(
|
| 147 |
+
"Scene media API rejected payload for scene %s (422): %s",
|
| 148 |
+
scene_id,
|
| 149 |
+
exc.response.text if exc.response is not None else "",
|
| 150 |
+
)
|
| 151 |
+
return
|
| 152 |
+
if status == 500:
|
| 153 |
+
logger.warning(
|
| 154 |
+
"Scene media API encountered server error (500) for scene %s; skipping registration",
|
| 155 |
+
scene_id,
|
| 156 |
+
)
|
| 157 |
+
return
|
| 158 |
+
logger.exception("Failed to register scene media entries for scene %s", scene_id)
|
| 159 |
+
except requests.RequestException:
|
| 160 |
+
logger.exception("Failed to register scene media entries for scene %s", scene_id)
|
| 161 |
+
|
| 162 |
+
|
| 163 |
def _resolve_frame_entry(entry: Any, *, index: int, dest_dir: Path) -> FrameRecord:
|
| 164 |
metadata: dict[str, Any] = {}
|
| 165 |
timestamp = None
|
|
|
|
| 400 |
offset += request_limit
|
| 401 |
|
| 402 |
return records
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 403 |
def _save_pointmaps(
|
| 404 |
*,
|
| 405 |
runtime: WorkerRuntime,
|
|
|
|
| 415 |
raise RuntimeError("Predictions missing world points")
|
| 416 |
|
| 417 |
world_points = np.asarray(world_points)
|
| 418 |
+
confidence = pose_confidence(predictions)
|
| 419 |
if confidence is None:
|
| 420 |
confidence = np.ones(world_points.shape[:-1], dtype=np.float32)
|
| 421 |
|
|
|
|
| 695 |
max_gap=max(0, settings.keyframe_max_gap_frames),
|
| 696 |
)
|
| 697 |
total_frames = len(frame_records)
|
| 698 |
+
confidence = pose_confidence(predictions)
|
| 699 |
world_points = predictions.get("world_points")
|
| 700 |
if world_points is None:
|
| 701 |
world_points = predictions.get("world_points_from_depth")
|
|
|
|
| 782 |
) -> list[dict[str, Any]]:
|
| 783 |
if top_k <= 0:
|
| 784 |
return []
|
| 785 |
+
confidence = pose_confidence(predictions)
|
| 786 |
if confidence is None:
|
| 787 |
return []
|
| 788 |
scores = confidence.reshape(confidence.shape[0], -1).mean(axis=1)
|
|
|
|
| 857 |
ceiling_margin_value = float(ceiling_margin_value) if ceiling_margin_value is not None else 0.05
|
| 858 |
except (TypeError, ValueError):
|
| 859 |
ceiling_margin_value = 0.05
|
| 860 |
+
ceiling_z_max = payload.get("ceiling_z_max")
|
| 861 |
+
try:
|
| 862 |
+
ceiling_z_max_value = float(ceiling_z_max) if ceiling_z_max is not None else None
|
| 863 |
+
except (TypeError, ValueError):
|
| 864 |
+
ceiling_z_max_value = None
|
| 865 |
|
| 866 |
scene = predictions_to_glb(
|
| 867 |
dict(predictions),
|
|
|
|
| 875 |
prediction_mode=payload.get("prediction_mode", "Predicted Pointmap"),
|
| 876 |
ceiling_percentile=ceiling_percentile_value,
|
| 877 |
ceiling_margin=ceiling_margin_value,
|
| 878 |
+
ceiling_z_max=ceiling_z_max_value,
|
| 879 |
)
|
| 880 |
scene.export(file_obj=str(local_file))
|
| 881 |
key = runtime.storage.build_key(
|
|
|
|
| 1334 |
return _execute_job("model_build", payload, _handle_model_build)
|
| 1335 |
|
| 1336 |
|
| 1337 |
+
def _fallback_selection(frame_records: list[FrameRecord], top_k: int) -> KeyframeSelectionResult:
|
| 1338 |
+
indices = linear_sample_indices(len(frame_records), top_k)
|
| 1339 |
+
diagnostics = [
|
| 1340 |
+
{
|
| 1341 |
+
"frame_id": frame_records[idx].frame_id,
|
| 1342 |
+
"frame_index": frame_records[idx].index,
|
| 1343 |
+
"reason": "linear",
|
| 1344 |
+
}
|
| 1345 |
+
for idx in indices
|
| 1346 |
+
]
|
| 1347 |
+
return KeyframeSelectionResult(indices=indices, diagnostics=diagnostics, top_k=len(indices))
|
| 1348 |
+
|
| 1349 |
+
|
| 1350 |
+
def keyframe_selection_job(payload: Mapping[str, Any]) -> dict[str, Any]:
|
| 1351 |
+
runtime = get_runtime()
|
| 1352 |
+
job = get_current_job()
|
| 1353 |
+
payload = dict(payload)
|
| 1354 |
+
|
| 1355 |
+
job_id = str(payload.get("job_id") or (job.id if job else uuid.uuid4()))
|
| 1356 |
+
scene_id = payload.get("scene_id")
|
| 1357 |
+
if not scene_id:
|
| 1358 |
+
raise ValueError("Keyframe job payload is missing 'scene_id'")
|
| 1359 |
+
video_key = payload.get("video_key")
|
| 1360 |
+
if not video_key:
|
| 1361 |
+
raise ValueError("Keyframe job payload is missing 'video_key'")
|
| 1362 |
+
|
| 1363 |
+
job_type = "keyframe_selection"
|
| 1364 |
+
job_meta = {
|
| 1365 |
+
"job_id": job_id,
|
| 1366 |
+
"job_type": job_type,
|
| 1367 |
+
"scene_id": scene_id,
|
| 1368 |
+
}
|
| 1369 |
+
|
| 1370 |
+
sanitized_payload = {
|
| 1371 |
+
"scene_id": scene_id,
|
| 1372 |
+
"video_key": video_key,
|
| 1373 |
+
"top_k": payload.get("top_k"),
|
| 1374 |
+
"extract_fps": payload.get("extract_fps"),
|
| 1375 |
+
"extract_max_frames": payload.get("extract_max_frames"),
|
| 1376 |
+
}
|
| 1377 |
+
|
| 1378 |
+
runtime.db.upsert_job(
|
| 1379 |
+
job_id=job_id,
|
| 1380 |
+
job_type=job_type,
|
| 1381 |
+
scene_id=scene_id,
|
| 1382 |
+
status="started",
|
| 1383 |
+
payload=sanitized_payload,
|
| 1384 |
+
)
|
| 1385 |
+
runtime_emit(
|
| 1386 |
+
runtime,
|
| 1387 |
+
{
|
| 1388 |
+
**job_meta,
|
| 1389 |
+
"status": "started",
|
| 1390 |
+
"progress": 0,
|
| 1391 |
+
"ts": datetime.now(timezone.utc).timestamp(),
|
| 1392 |
+
},
|
| 1393 |
+
)
|
| 1394 |
+
|
| 1395 |
+
start_time = perf_counter()
|
| 1396 |
+
|
| 1397 |
+
try:
|
| 1398 |
+
with tempfile.TemporaryDirectory(prefix=f"keyframe_{job_id}_") as tmp_dir:
|
| 1399 |
+
temp_path = Path(tmp_dir)
|
| 1400 |
+
video_path = temp_path / "input_video"
|
| 1401 |
+
runtime.storage.download_to_path(video_key, video_path)
|
| 1402 |
+
|
| 1403 |
+
extract_fps = payload.get("extract_fps")
|
| 1404 |
+
try:
|
| 1405 |
+
extract_fps_value = float(extract_fps) if extract_fps is not None else runtime.settings.keyframe_extract_fps
|
| 1406 |
+
except (TypeError, ValueError):
|
| 1407 |
+
extract_fps_value = runtime.settings.keyframe_extract_fps
|
| 1408 |
+
|
| 1409 |
+
max_frames = _as_int(
|
| 1410 |
+
payload.get("extract_max_frames"),
|
| 1411 |
+
runtime.settings.keyframe_extract_max_frames,
|
| 1412 |
+
)
|
| 1413 |
+
|
| 1414 |
+
frame_records, native_fps = extract_video_frames(
|
| 1415 |
+
video_path,
|
| 1416 |
+
temp_path / "frames",
|
| 1417 |
+
target_fps=extract_fps_value,
|
| 1418 |
+
max_frames=max_frames,
|
| 1419 |
+
)
|
| 1420 |
+
|
| 1421 |
+
selection = run_keyframe_prepass(
|
| 1422 |
+
runtime=runtime,
|
| 1423 |
+
payload=payload,
|
| 1424 |
+
frame_records=frame_records,
|
| 1425 |
+
mode="window",
|
| 1426 |
+
streaming=True,
|
| 1427 |
+
window_size=runtime.settings.stream_window_size,
|
| 1428 |
+
)
|
| 1429 |
+
if selection is None or not selection.indices:
|
| 1430 |
+
requested_top_k = _as_int(payload.get("top_k"), runtime.settings.keyframe_default_top_k)
|
| 1431 |
+
selection = _fallback_selection(frame_records, requested_top_k)
|
| 1432 |
+
|
| 1433 |
+
selected_records = [frame_records[i] for i in selection.indices]
|
| 1434 |
+
storage_entries, media_entries = build_keyframe_uploads(
|
| 1435 |
+
runtime,
|
| 1436 |
+
scene_id,
|
| 1437 |
+
selected_records,
|
| 1438 |
+
selection.diagnostics,
|
| 1439 |
+
subdir=runtime.settings.keyframe_upload_dir,
|
| 1440 |
+
)
|
| 1441 |
+
|
| 1442 |
+
_register_scene_media_entries(runtime, scene_id, media_entries)
|
| 1443 |
+
|
| 1444 |
+
result_payload = {
|
| 1445 |
+
"job_id": job_id,
|
| 1446 |
+
"job_type": job_type,
|
| 1447 |
+
"scene_id": scene_id,
|
| 1448 |
+
"video_key": video_key,
|
| 1449 |
+
"native_fps": native_fps,
|
| 1450 |
+
"total_frames": len(frame_records),
|
| 1451 |
+
"selected_frames": storage_entries,
|
| 1452 |
+
"selection": selection.diagnostics,
|
| 1453 |
+
}
|
| 1454 |
+
|
| 1455 |
+
except Exception as exc:
|
| 1456 |
+
error_text = traceback.format_exc()
|
| 1457 |
+
runtime.db.upsert_job(
|
| 1458 |
+
job_id=job_id,
|
| 1459 |
+
job_type=job_type,
|
| 1460 |
+
scene_id=scene_id,
|
| 1461 |
+
status="failed",
|
| 1462 |
+
error=error_text,
|
| 1463 |
+
)
|
| 1464 |
+
runtime_emit(
|
| 1465 |
+
runtime,
|
| 1466 |
+
{
|
| 1467 |
+
**job_meta,
|
| 1468 |
+
"status": "failed",
|
| 1469 |
+
"ts": datetime.now(timezone.utc).timestamp(),
|
| 1470 |
+
"error": str(exc),
|
| 1471 |
+
},
|
| 1472 |
+
)
|
| 1473 |
+
logger.exception("Keyframe selection job %s failed", job_id)
|
| 1474 |
+
raise
|
| 1475 |
+
|
| 1476 |
+
runtime.db.upsert_job(
|
| 1477 |
+
job_id=job_id,
|
| 1478 |
+
job_type=job_type,
|
| 1479 |
+
scene_id=scene_id,
|
| 1480 |
+
status="finished",
|
| 1481 |
+
result=result_payload,
|
| 1482 |
+
)
|
| 1483 |
+
|
| 1484 |
+
runtime_emit(
|
| 1485 |
+
runtime,
|
| 1486 |
+
{
|
| 1487 |
+
**job_meta,
|
| 1488 |
+
"status": "finished",
|
| 1489 |
+
"progress": 100,
|
| 1490 |
+
"ts": datetime.now(timezone.utc).timestamp(),
|
| 1491 |
+
},
|
| 1492 |
+
)
|
| 1493 |
+
|
| 1494 |
+
logger.info(
|
| 1495 |
+
"Keyframe selection job %s finished in %.2fs (selected %d/%d frames)",
|
| 1496 |
+
job_id,
|
| 1497 |
+
perf_counter() - start_time,
|
| 1498 |
+
len(selection.indices),
|
| 1499 |
+
len(frame_records),
|
| 1500 |
+
)
|
| 1501 |
+
|
| 1502 |
+
return result_payload
|
| 1503 |
+
|
| 1504 |
+
|
| 1505 |
def _handle_model_build(
|
| 1506 |
*,
|
| 1507 |
runtime: WorkerRuntime,
|
tests/test_voxel_reduction.py
CHANGED
|
@@ -222,3 +222,47 @@ def test_predictions_to_glb_ceiling_filter():
|
|
| 222 |
point_cloud = next(iter(scene.geometry.values()))
|
| 223 |
max_z = point_cloud.vertices[:, 2].max()
|
| 224 |
assert max_z < 1.6
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 222 |
point_cloud = next(iter(scene.geometry.values()))
|
| 223 |
max_z = point_cloud.vertices[:, 2].max()
|
| 224 |
assert max_z < 1.6
|
| 225 |
+
|
| 226 |
+
|
| 227 |
+
def test_predictions_to_glb_ceiling_absolute_cut():
|
| 228 |
+
world_points = np.array(
|
| 229 |
+
[
|
| 230 |
+
[
|
| 231 |
+
[[0.0, 0.0, 0.5], [0.0, 0.0, 1.0]],
|
| 232 |
+
[[0.0, 0.0, 1.2], [0.0, 0.0, 2.0]],
|
| 233 |
+
]
|
| 234 |
+
],
|
| 235 |
+
dtype=np.float32,
|
| 236 |
+
)
|
| 237 |
+
predictions = {
|
| 238 |
+
"world_points": world_points,
|
| 239 |
+
"world_points_conf": np.ones((1, 2, 2), dtype=np.float32),
|
| 240 |
+
"world_points_from_depth": world_points,
|
| 241 |
+
"depth_conf": np.ones((1, 2, 2), dtype=np.float32),
|
| 242 |
+
"images": np.ones((1, 2, 2, 3), dtype=np.float32) * 0.5,
|
| 243 |
+
"extrinsic": np.array(
|
| 244 |
+
[
|
| 245 |
+
[
|
| 246 |
+
[1.0, 0.0, 0.0, 0.0],
|
| 247 |
+
[0.0, 1.0, 0.0, 0.0],
|
| 248 |
+
[0.0, 0.0, 1.0, 0.0],
|
| 249 |
+
]
|
| 250 |
+
],
|
| 251 |
+
dtype=np.float32,
|
| 252 |
+
),
|
| 253 |
+
}
|
| 254 |
+
|
| 255 |
+
scene = predictions_to_glb(
|
| 256 |
+
predictions,
|
| 257 |
+
conf_thres=0.0,
|
| 258 |
+
voxel_size=None,
|
| 259 |
+
o3d_denoise=False,
|
| 260 |
+
density_filter=False,
|
| 261 |
+
reinflate_enabled=False,
|
| 262 |
+
ceiling_z_max=1.1,
|
| 263 |
+
)
|
| 264 |
+
|
| 265 |
+
assert isinstance(scene, trimesh.Scene)
|
| 266 |
+
point_cloud = next(iter(scene.geometry.values()))
|
| 267 |
+
max_z = point_cloud.vertices[:, 2].max()
|
| 268 |
+
assert max_z <= 1.1
|
worker/stream3r/jobs.py
CHANGED
|
@@ -4,12 +4,13 @@ from __future__ import annotations
|
|
| 4 |
|
| 5 |
from typing import Any, Callable, Mapping
|
| 6 |
|
| 7 |
-
from stream3r.worker.tasks import model_build_job, pose_pointmap_job
|
| 8 |
|
| 9 |
|
| 10 |
_HANDLERS: dict[str, Callable[[Mapping[str, Any]], Any]] = {
|
| 11 |
"pose_pointmap": pose_pointmap_job,
|
| 12 |
"model_build": model_build_job,
|
|
|
|
| 13 |
}
|
| 14 |
|
| 15 |
|
|
@@ -42,7 +43,7 @@ def handle_job(*args: Any, **kwargs: Any) -> Any:
|
|
| 42 |
if isinstance(candidate, Mapping):
|
| 43 |
payload = candidate
|
| 44 |
|
| 45 |
-
if payload is None and isinstance(args[0], Mapping):
|
| 46 |
payload = args[0]
|
| 47 |
job_type = str(payload.get("job_type")) if payload else job_type
|
| 48 |
|
|
@@ -60,4 +61,3 @@ def handle_job(*args: Any, **kwargs: Any) -> Any:
|
|
| 60 |
raise ValueError(f"Unsupported job_type '{job_type}'")
|
| 61 |
|
| 62 |
return handler(payload)
|
| 63 |
-
|
|
|
|
| 4 |
|
| 5 |
from typing import Any, Callable, Mapping
|
| 6 |
|
| 7 |
+
from stream3r.worker.tasks import keyframe_selection_job, model_build_job, pose_pointmap_job
|
| 8 |
|
| 9 |
|
| 10 |
_HANDLERS: dict[str, Callable[[Mapping[str, Any]], Any]] = {
|
| 11 |
"pose_pointmap": pose_pointmap_job,
|
| 12 |
"model_build": model_build_job,
|
| 13 |
+
"keyframe_selection": keyframe_selection_job,
|
| 14 |
}
|
| 15 |
|
| 16 |
|
|
|
|
| 43 |
if isinstance(candidate, Mapping):
|
| 44 |
payload = candidate
|
| 45 |
|
| 46 |
+
if payload is None and args and isinstance(args[0], Mapping):
|
| 47 |
payload = args[0]
|
| 48 |
job_type = str(payload.get("job_type")) if payload else job_type
|
| 49 |
|
|
|
|
| 61 |
raise ValueError(f"Unsupported job_type '{job_type}'")
|
| 62 |
|
| 63 |
return handler(payload)
|
|
|