Spaces:
Configuration error
Configuration error
| ```markdown | |
| # Keyframe Selection Service | |
| **Author:** Brian Clark | |
| **Last Updated:** 2025-11-07 | |
| **Audience:** Integrators needing motion-aware keyframes instead of linear sampling | |
| --- | |
| ## Overview | |
| The keyframe selection worker ingests a raw video (Backblaze/S3 key), extracts frames, runs motion + coverage analysis, and uploads the chosen keyframes back to the media store while recording them in the `scene_media` table. Unlike linear FPS sampling, it keeps only the most informative views (~10–20 frames for a typical 30 s scan). | |
| The workflow is exposed as an RQ job (`keyframe_selection`). Another service (e.g., scene-graph-manager) can enqueue this job instead of doing its own downsampling. | |
| --- | |
| ## Job Inputs | |
| | Field | Required | Description | | |
| |-------|----------|-------------| | |
| | `scene_id` | ✅ | Scene identifier used in `scene_media` | | |
| | `video_key` | ✅ | Storage key (Backblaze/S3) for the source video | | |
| | `top_k` | optional | Desired maximum keyframes; defaults to `STREAM3R_KEYFRAME_TOP_K` | | |
| | `extract_fps` | optional | Override extraction FPS (default `STREAM3R_KEYFRAME_EXTRACT_FPS`) | | |
| | `extract_max_frames` | optional | Cap on total decoded frames (default `STREAM3R_KEYFRAME_EXTRACT_MAX_FRAMES`) | | |
| | Optional filters | e.g., `ceiling_percentile`, `ceiling_z_max` for downstream GLB | | |
| Example payload: | |
| ```json | |
| { | |
| "job_type": "keyframe_selection", | |
| "scene_id": "scene-123", | |
| "video_key": "scene-data/videos/scene-123.mp4", | |
| "top_k": 16, | |
| "extract_fps": 6.0 | |
| } | |
| ``` | |
| Enqueue via RQ: | |
| ```python | |
| from rq import Queue | |
| from redis import Redis | |
| queue = Queue("keyframe_selection", connection=Redis.from_url("redis://")) | |
| queue.enqueue("worker.stream3r.jobs.handle_job", { | |
| "job_type": "keyframe_selection", | |
| "scene_id": "scene-123", | |
| "video_key": "scene-data/videos/scene-123.mp4" | |
| }) | |
| ``` | |
| --- | |
| ## Processing Pipeline | |
| 1. **Download & Extract Frames** | |
| - Video pulled via `runtime.storage.download_to_path`. | |
| - Frames decoded with OpenCV at `extract_fps` (default 6 fps) up to `extract_max_frames`. | |
| 2. **Motion + Coverage Pre-pass** | |
| - Lightweight Stream3R windowed inference run to collect poses (`extrinsic`) and confidences. | |
| - Motion scoring: translation + weighted rotation deltas, greedily ensures pose diversity. | |
| - Coverage scoring: counts high-confidence voxel IDs contributed per frame, greedily maximizes new coverage. | |
| - Diagnostics stored per frame (reason, motion delta, coverage gain, confidence). | |
| - If inference fails, fallback to linear sampling. | |
| 3. **Selection & Upload** | |
| - Selected frames copied; images uploaded via `runtime.storage.upload_file` under `keyframe_upload_dir` (default `keyframes`). | |
| - Scene media rows inserted through `record_scene_media_entries` API. | |
| - Manifest (`selected_frames`) returned with storage keys and diagnostics. | |
| 4. **Result** | |
| - Job metadata includes: native video FPS, total extracted frames, selected frame details, and diagnostics. | |
| - `selected_frames.json` (optional) can be stored by downstream jobs for auditing. | |
| - Scene-media registration is attempted via the configured API; if the endpoint does not accept POST (e.g., legacy deployments), the worker logs and skips registration without failing the job. | |
| --- | |
| ## Outputs & Diagnostics | |
| Result payload (simplified): | |
| ```json | |
| { | |
| "job_id": "...", | |
| "scene_id": "scene-123", | |
| "video_key": "scene-data/videos/scene-123.mp4", | |
| "native_fps": 29.97, | |
| "total_frames": 420, | |
| "selected_frames": [ | |
| { | |
| "frame_id": "frame_000012", | |
| "frame_index": 12, | |
| "url": "s3://bucket/scenes/scene-123/keyframes/frame_000012.jpg", | |
| "storage_key": "scenes/scene-123/keyframes/frame_000012.jpg", | |
| "diagnostics": { | |
| "reason": "motion", | |
| "motion_delta": 0.42, | |
| "coverage_gain_ratio": 0.08, | |
| "mean_confidence": 0.67 | |
| } | |
| } | |
| ] | |
| } | |
| ``` | |
| Each uploaded frame is also inserted/updated in the `scene_media` table via the Scene Graph API (`/scenes/{scene_id}/media`). | |
| --- | |
| ## Configuration | |
| Environment variables for fine-tuning: | |
| | Env Var | Default | Notes | | |
| |---------|---------|-------| | |
| | `STREAM3R_QUEUE_KEYFRAME` | `keyframe_selection` | RQ queue name | | |
| | `STREAM3R_KEYFRAME_EXTRACT_FPS` | `6.0` | Extraction FPS | | |
| | `STREAM3R_KEYFRAME_EXTRACT_MAX_FRAMES` | `1200` | Extraction cap | | |
| | `STREAM3R_KEYFRAME_UPLOAD_DIR` | `keyframes` | Storage subdirectory | | |
| | `STREAM3R_KEYFRAME_TOP_K` | `16` | Default selection budget | | |
| | `STREAM3R_KEYFRAME_PREPASS` | `1` | Enable motion/coverage inference | | |
| | `STREAM3R_KEYFRAME_MOTION_THRESH` | `0.4` | Motion threshold | | |
| | `STREAM3R_KEYFRAME_ROT_WEIGHT` | `0.5` | Rotation weight | | |
| | `STREAM3R_KEYFRAME_MIN_GAIN` | `0.01` | Min coverage gain | | |
| | `STREAM3R_KEYFRAME_FULL_MAX_FRAMES` | `24` | Switch to full attention when below | | |
| Scene media API requirements: | |
| - `STREAM3R_MEDIA_API_BASE_URL` | |
| - `STREAM3R_MEDIA_API_TOKEN` for authenticated inserts | |
| Ceiling trimming (optional) can be set per job via `ceiling_percentile`, `ceiling_margin`, or `ceiling_z_max` so downstream GLBs remain clean. | |
| --- | |
| ## Integration Steps for External Services | |
| 1. **Deploy Worker Queue** | |
| - Run the Stream3R worker with `--queue keyframe_selection` (already default when env var set). | |
| - Ensure GPU not required: pre-pass uses Stream3R; CPU-only environments should set `STREAM3R_MODEL_DEVICE=cpu` or schedule on GPU hosts. | |
| 2. **Enqueue Jobs** | |
| - Replace existing linear sampling code with an RQ enqueue call. | |
| - Store job IDs if you need to poll job status or consume events. | |
| 3. **Consume Results** | |
| - After completion, list `scene_media` for `media_type=image` to retrieve new keyframe entries. | |
| - Inspect returned diagnostics for debugging or to render navigation overlays. | |
| 4. **Fallback Handling** | |
| - If the job fails, the queue returns error details; you can revert to your existing sampler. | |
| - Consider scheduling a retry with adjusted parameters (e.g., lower `top_k`). | |
| --- | |
| ## Benefits vs. Linear Sampling | |
| - **Fewer redundant frames**: motion-aware spacing ensures pose diversity. | |
| - **Better geometry coverage**: only keeps frames that add new high-confidence voxels. | |
| - **Consistent diagnostics**: each selected frame includes reasons and confidence, aiding QA. | |
| - **Automatic uploads**: frames stored in Backblaze/local storage with `scene_media` entries ready for viewers. | |
| --- | |
| ## Future Enhancements | |
| - Optional semantic filtering to avoid ceilings/walls. | |
| - Exposure of thumbnails or depth maps alongside keyframes. | |
| - Batch selection across multiple videos. | |
| --- | |
| For questions or integration support, contact the Stream3R team or refer to `stream3r/worker/keyframes.py` for implementation details. | |
| ``` | |