Spaces:
Configuration error
Configuration error
File size: 6,673 Bytes
01e8928 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 |
```markdown
# Keyframe Selection Service
**Author:** Brian Clark
**Last Updated:** 2025-11-07
**Audience:** Integrators needing motion-aware keyframes instead of linear sampling
---
## Overview
The keyframe selection worker ingests a raw video (Backblaze/S3 key), extracts frames, runs motion + coverage analysis, and uploads the chosen keyframes back to the media store while recording them in the `scene_media` table. Unlike linear FPS sampling, it keeps only the most informative views (~10–20 frames for a typical 30 s scan).
The workflow is exposed as an RQ job (`keyframe_selection`). Another service (e.g., scene-graph-manager) can enqueue this job instead of doing its own downsampling.
---
## Job Inputs
| Field | Required | Description |
|-------|----------|-------------|
| `scene_id` | ✅ | Scene identifier used in `scene_media` |
| `video_key` | ✅ | Storage key (Backblaze/S3) for the source video |
| `top_k` | optional | Desired maximum keyframes; defaults to `STREAM3R_KEYFRAME_TOP_K` |
| `extract_fps` | optional | Override extraction FPS (default `STREAM3R_KEYFRAME_EXTRACT_FPS`) |
| `extract_max_frames` | optional | Cap on total decoded frames (default `STREAM3R_KEYFRAME_EXTRACT_MAX_FRAMES`) |
| Optional filters | e.g., `ceiling_percentile`, `ceiling_z_max` for downstream GLB |
Example payload:
```json
{
"job_type": "keyframe_selection",
"scene_id": "scene-123",
"video_key": "scene-data/videos/scene-123.mp4",
"top_k": 16,
"extract_fps": 6.0
}
```
Enqueue via RQ:
```python
from rq import Queue
from redis import Redis
queue = Queue("keyframe_selection", connection=Redis.from_url("redis://"))
queue.enqueue("worker.stream3r.jobs.handle_job", {
"job_type": "keyframe_selection",
"scene_id": "scene-123",
"video_key": "scene-data/videos/scene-123.mp4"
})
```
---
## Processing Pipeline
1. **Download & Extract Frames**
- Video pulled via `runtime.storage.download_to_path`.
- Frames decoded with OpenCV at `extract_fps` (default 6 fps) up to `extract_max_frames`.
2. **Motion + Coverage Pre-pass**
- Lightweight Stream3R windowed inference run to collect poses (`extrinsic`) and confidences.
- Motion scoring: translation + weighted rotation deltas, greedily ensures pose diversity.
- Coverage scoring: counts high-confidence voxel IDs contributed per frame, greedily maximizes new coverage.
- Diagnostics stored per frame (reason, motion delta, coverage gain, confidence).
- If inference fails, fallback to linear sampling.
3. **Selection & Upload**
- Selected frames copied; images uploaded via `runtime.storage.upload_file` under `keyframe_upload_dir` (default `keyframes`).
- Scene media rows inserted through `record_scene_media_entries` API.
- Manifest (`selected_frames`) returned with storage keys and diagnostics.
4. **Result**
- Job metadata includes: native video FPS, total extracted frames, selected frame details, and diagnostics.
- `selected_frames.json` (optional) can be stored by downstream jobs for auditing.
- Scene-media registration is attempted via the configured API; if the endpoint does not accept POST (e.g., legacy deployments), the worker logs and skips registration without failing the job.
---
## Outputs & Diagnostics
Result payload (simplified):
```json
{
"job_id": "...",
"scene_id": "scene-123",
"video_key": "scene-data/videos/scene-123.mp4",
"native_fps": 29.97,
"total_frames": 420,
"selected_frames": [
{
"frame_id": "frame_000012",
"frame_index": 12,
"url": "s3://bucket/scenes/scene-123/keyframes/frame_000012.jpg",
"storage_key": "scenes/scene-123/keyframes/frame_000012.jpg",
"diagnostics": {
"reason": "motion",
"motion_delta": 0.42,
"coverage_gain_ratio": 0.08,
"mean_confidence": 0.67
}
}
]
}
```
Each uploaded frame is also inserted/updated in the `scene_media` table via the Scene Graph API (`/scenes/{scene_id}/media`).
---
## Configuration
Environment variables for fine-tuning:
| Env Var | Default | Notes |
|---------|---------|-------|
| `STREAM3R_QUEUE_KEYFRAME` | `keyframe_selection` | RQ queue name |
| `STREAM3R_KEYFRAME_EXTRACT_FPS` | `6.0` | Extraction FPS |
| `STREAM3R_KEYFRAME_EXTRACT_MAX_FRAMES` | `1200` | Extraction cap |
| `STREAM3R_KEYFRAME_UPLOAD_DIR` | `keyframes` | Storage subdirectory |
| `STREAM3R_KEYFRAME_TOP_K` | `16` | Default selection budget |
| `STREAM3R_KEYFRAME_PREPASS` | `1` | Enable motion/coverage inference |
| `STREAM3R_KEYFRAME_MOTION_THRESH` | `0.4` | Motion threshold |
| `STREAM3R_KEYFRAME_ROT_WEIGHT` | `0.5` | Rotation weight |
| `STREAM3R_KEYFRAME_MIN_GAIN` | `0.01` | Min coverage gain |
| `STREAM3R_KEYFRAME_FULL_MAX_FRAMES` | `24` | Switch to full attention when below |
Scene media API requirements:
- `STREAM3R_MEDIA_API_BASE_URL`
- `STREAM3R_MEDIA_API_TOKEN` for authenticated inserts
Ceiling trimming (optional) can be set per job via `ceiling_percentile`, `ceiling_margin`, or `ceiling_z_max` so downstream GLBs remain clean.
---
## Integration Steps for External Services
1. **Deploy Worker Queue**
- Run the Stream3R worker with `--queue keyframe_selection` (already default when env var set).
- Ensure GPU not required: pre-pass uses Stream3R; CPU-only environments should set `STREAM3R_MODEL_DEVICE=cpu` or schedule on GPU hosts.
2. **Enqueue Jobs**
- Replace existing linear sampling code with an RQ enqueue call.
- Store job IDs if you need to poll job status or consume events.
3. **Consume Results**
- After completion, list `scene_media` for `media_type=image` to retrieve new keyframe entries.
- Inspect returned diagnostics for debugging or to render navigation overlays.
4. **Fallback Handling**
- If the job fails, the queue returns error details; you can revert to your existing sampler.
- Consider scheduling a retry with adjusted parameters (e.g., lower `top_k`).
---
## Benefits vs. Linear Sampling
- **Fewer redundant frames**: motion-aware spacing ensures pose diversity.
- **Better geometry coverage**: only keeps frames that add new high-confidence voxels.
- **Consistent diagnostics**: each selected frame includes reasons and confidence, aiding QA.
- **Automatic uploads**: frames stored in Backblaze/local storage with `scene_media` entries ready for viewers.
---
## Future Enhancements
- Optional semantic filtering to avoid ceilings/walls.
- Exposure of thumbnails or depth maps alongside keyframes.
- Batch selection across multiple videos.
---
For questions or integration support, contact the Stream3R team or refer to `stream3r/worker/keyframes.py` for implementation details.
```
|