File size: 6,673 Bytes
01e8928
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
```markdown
# Keyframe Selection Service

**Author:** Brian Clark  
**Last Updated:** 2025-11-07  
**Audience:** Integrators needing motion-aware keyframes instead of linear sampling

---

## Overview

The keyframe selection worker ingests a raw video (Backblaze/S3 key), extracts frames, runs motion + coverage analysis, and uploads the chosen keyframes back to the media store while recording them in the `scene_media` table. Unlike linear FPS sampling, it keeps only the most informative views (~10–20 frames for a typical 30 s scan).

The workflow is exposed as an RQ job (`keyframe_selection`). Another service (e.g., scene-graph-manager) can enqueue this job instead of doing its own downsampling.

---

## Job Inputs

| Field | Required | Description |
|-------|----------|-------------|
| `scene_id` | ✅ | Scene identifier used in `scene_media` |
| `video_key` | ✅ | Storage key (Backblaze/S3) for the source video |
| `top_k` | optional | Desired maximum keyframes; defaults to `STREAM3R_KEYFRAME_TOP_K` |
| `extract_fps` | optional | Override extraction FPS (default `STREAM3R_KEYFRAME_EXTRACT_FPS`) |
| `extract_max_frames` | optional | Cap on total decoded frames (default `STREAM3R_KEYFRAME_EXTRACT_MAX_FRAMES`) |
| Optional filters | e.g., `ceiling_percentile`, `ceiling_z_max` for downstream GLB |

Example payload:

```json
{
  "job_type": "keyframe_selection",
  "scene_id": "scene-123",
  "video_key": "scene-data/videos/scene-123.mp4",
  "top_k": 16,
  "extract_fps": 6.0
}
```

Enqueue via RQ:

```python
from rq import Queue
from redis import Redis

queue = Queue("keyframe_selection", connection=Redis.from_url("redis://"))
queue.enqueue("worker.stream3r.jobs.handle_job", {
    "job_type": "keyframe_selection",
    "scene_id": "scene-123",
    "video_key": "scene-data/videos/scene-123.mp4"
})
```

---

## Processing Pipeline

1. **Download & Extract Frames**
   - Video pulled via `runtime.storage.download_to_path`.
   - Frames decoded with OpenCV at `extract_fps` (default 6 fps) up to `extract_max_frames`.

2. **Motion + Coverage Pre-pass**
   - Lightweight Stream3R windowed inference run to collect poses (`extrinsic`) and confidences.
   - Motion scoring: translation + weighted rotation deltas, greedily ensures pose diversity.
   - Coverage scoring: counts high-confidence voxel IDs contributed per frame, greedily maximizes new coverage.
   - Diagnostics stored per frame (reason, motion delta, coverage gain, confidence).
   - If inference fails, fallback to linear sampling.

3. **Selection & Upload**
   - Selected frames copied; images uploaded via `runtime.storage.upload_file` under `keyframe_upload_dir` (default `keyframes`).
   - Scene media rows inserted through `record_scene_media_entries` API.
   - Manifest (`selected_frames`) returned with storage keys and diagnostics.

4. **Result**
   - Job metadata includes: native video FPS, total extracted frames, selected frame details, and diagnostics.
   - `selected_frames.json` (optional) can be stored by downstream jobs for auditing.
   - Scene-media registration is attempted via the configured API; if the endpoint does not accept POST (e.g., legacy deployments), the worker logs and skips registration without failing the job.

---

## Outputs & Diagnostics

Result payload (simplified):

```json
{
  "job_id": "...",
  "scene_id": "scene-123",
  "video_key": "scene-data/videos/scene-123.mp4",
  "native_fps": 29.97,
  "total_frames": 420,
  "selected_frames": [
    {
      "frame_id": "frame_000012",
      "frame_index": 12,
      "url": "s3://bucket/scenes/scene-123/keyframes/frame_000012.jpg",
      "storage_key": "scenes/scene-123/keyframes/frame_000012.jpg",
      "diagnostics": {
        "reason": "motion",
        "motion_delta": 0.42,
        "coverage_gain_ratio": 0.08,
        "mean_confidence": 0.67
      }
    }
  ]
}
```

Each uploaded frame is also inserted/updated in the `scene_media` table via the Scene Graph API (`/scenes/{scene_id}/media`).

---

## Configuration

Environment variables for fine-tuning:

| Env Var | Default | Notes |
|---------|---------|-------|
| `STREAM3R_QUEUE_KEYFRAME` | `keyframe_selection` | RQ queue name |
| `STREAM3R_KEYFRAME_EXTRACT_FPS` | `6.0` | Extraction FPS |
| `STREAM3R_KEYFRAME_EXTRACT_MAX_FRAMES` | `1200` | Extraction cap |
| `STREAM3R_KEYFRAME_UPLOAD_DIR` | `keyframes` | Storage subdirectory |
| `STREAM3R_KEYFRAME_TOP_K` | `16` | Default selection budget |
| `STREAM3R_KEYFRAME_PREPASS` | `1` | Enable motion/coverage inference |
| `STREAM3R_KEYFRAME_MOTION_THRESH` | `0.4` | Motion threshold |
| `STREAM3R_KEYFRAME_ROT_WEIGHT` | `0.5` | Rotation weight |
| `STREAM3R_KEYFRAME_MIN_GAIN` | `0.01` | Min coverage gain |
| `STREAM3R_KEYFRAME_FULL_MAX_FRAMES` | `24` | Switch to full attention when below |

Scene media API requirements:
- `STREAM3R_MEDIA_API_BASE_URL`
- `STREAM3R_MEDIA_API_TOKEN` for authenticated inserts

Ceiling trimming (optional) can be set per job via `ceiling_percentile`, `ceiling_margin`, or `ceiling_z_max` so downstream GLBs remain clean.

---

## Integration Steps for External Services

1. **Deploy Worker Queue**
   - Run the Stream3R worker with `--queue keyframe_selection` (already default when env var set).
   - Ensure GPU not required: pre-pass uses Stream3R; CPU-only environments should set `STREAM3R_MODEL_DEVICE=cpu` or schedule on GPU hosts.

2. **Enqueue Jobs**
   - Replace existing linear sampling code with an RQ enqueue call.
   - Store job IDs if you need to poll job status or consume events.

3. **Consume Results**
   - After completion, list `scene_media` for `media_type=image` to retrieve new keyframe entries.
   - Inspect returned diagnostics for debugging or to render navigation overlays.

4. **Fallback Handling**
   - If the job fails, the queue returns error details; you can revert to your existing sampler.
   - Consider scheduling a retry with adjusted parameters (e.g., lower `top_k`).

---

## Benefits vs. Linear Sampling

- **Fewer redundant frames**: motion-aware spacing ensures pose diversity.
- **Better geometry coverage**: only keeps frames that add new high-confidence voxels.
- **Consistent diagnostics**: each selected frame includes reasons and confidence, aiding QA.
- **Automatic uploads**: frames stored in Backblaze/local storage with `scene_media` entries ready for viewers.

---

## Future Enhancements

- Optional semantic filtering to avoid ceilings/walls.
- Exposure of thumbnails or depth maps alongside keyframes.
- Batch selection across multiple videos.

---

For questions or integration support, contact the Stream3R team or refer to `stream3r/worker/keyframes.py` for implementation details.

```