Spaces:
Configuration error
Configuration error
# Keyframe Selection Service
**Author:** Brian Clark
**Last Updated:** 2025-11-07
**Audience:** Integrators needing motion-aware keyframes instead of linear sampling
---
## Overview
The keyframe selection worker ingests a raw video (Backblaze/S3 key), extracts frames, runs motion + coverage analysis, and uploads the chosen keyframes back to the media store while recording them in the `scene_media` table. Unlike linear FPS sampling, it keeps only the most informative views (~10–20 frames for a typical 30 s scan).
The workflow is exposed as an RQ job (`keyframe_selection`). Another service (e.g., scene-graph-manager) can enqueue this job instead of doing its own downsampling.
---
## Job Inputs
| Field | Required | Description |
|-------|----------|-------------|
| `scene_id` | ✅ | Scene identifier used in `scene_media` |
| `video_key` | ✅ | Storage key (Backblaze/S3) for the source video |
| `top_k` | optional | Desired maximum keyframes; defaults to `STREAM3R_KEYFRAME_TOP_K` |
| `extract_fps` | optional | Override extraction FPS (default `STREAM3R_KEYFRAME_EXTRACT_FPS`) |
| `extract_max_frames` | optional | Cap on total decoded frames (default `STREAM3R_KEYFRAME_EXTRACT_MAX_FRAMES`) |
| Optional filters | e.g., `ceiling_percentile`, `ceiling_z_max` for downstream GLB |
Example payload:
```json
{
"job_type": "keyframe_selection",
"scene_id": "scene-123",
"video_key": "scene-data/videos/scene-123.mp4",
"top_k": 16,
"extract_fps": 6.0
}
Enqueue via RQ:
from rq import Queue
from redis import Redis
queue = Queue("keyframe_selection", connection=Redis.from_url("redis://"))
queue.enqueue("worker.stream3r.jobs.handle_job", {
"job_type": "keyframe_selection",
"scene_id": "scene-123",
"video_key": "scene-data/videos/scene-123.mp4"
})
Processing Pipeline
Download & Extract Frames
- Video pulled via
runtime.storage.download_to_path. - Frames decoded with OpenCV at
extract_fps(default 6 fps) up toextract_max_frames.
- Video pulled via
Motion + Coverage Pre-pass
- Lightweight Stream3R windowed inference run to collect poses (
extrinsic) and confidences. - Motion scoring: translation + weighted rotation deltas, greedily ensures pose diversity.
- Coverage scoring: counts high-confidence voxel IDs contributed per frame, greedily maximizes new coverage.
- Diagnostics stored per frame (reason, motion delta, coverage gain, confidence).
- If inference fails, fallback to linear sampling.
- Lightweight Stream3R windowed inference run to collect poses (
Selection & Upload
- Selected frames copied; images uploaded via
runtime.storage.upload_fileunderkeyframe_upload_dir(defaultkeyframes). - Scene media rows inserted through
record_scene_media_entriesAPI. - Manifest (
selected_frames) returned with storage keys and diagnostics.
- Selected frames copied; images uploaded via
Result
- Job metadata includes: native video FPS, total extracted frames, selected frame details, and diagnostics.
selected_frames.json(optional) can be stored by downstream jobs for auditing.- Scene-media registration is attempted via the configured API; if the endpoint does not accept POST (e.g., legacy deployments), the worker logs and skips registration without failing the job.
Outputs & Diagnostics
Result payload (simplified):
{
"job_id": "...",
"scene_id": "scene-123",
"video_key": "scene-data/videos/scene-123.mp4",
"native_fps": 29.97,
"total_frames": 420,
"selected_frames": [
{
"frame_id": "frame_000012",
"frame_index": 12,
"url": "s3://bucket/scenes/scene-123/keyframes/frame_000012.jpg",
"storage_key": "scenes/scene-123/keyframes/frame_000012.jpg",
"diagnostics": {
"reason": "motion",
"motion_delta": 0.42,
"coverage_gain_ratio": 0.08,
"mean_confidence": 0.67
}
}
]
}
Each uploaded frame is also inserted/updated in the scene_media table via the Scene Graph API (/scenes/{scene_id}/media).
Configuration
Environment variables for fine-tuning:
| Env Var | Default | Notes |
|---|---|---|
STREAM3R_QUEUE_KEYFRAME |
keyframe_selection |
RQ queue name |
STREAM3R_KEYFRAME_EXTRACT_FPS |
6.0 |
Extraction FPS |
STREAM3R_KEYFRAME_EXTRACT_MAX_FRAMES |
1200 |
Extraction cap |
STREAM3R_KEYFRAME_UPLOAD_DIR |
keyframes |
Storage subdirectory |
STREAM3R_KEYFRAME_TOP_K |
16 |
Default selection budget |
STREAM3R_KEYFRAME_PREPASS |
1 |
Enable motion/coverage inference |
STREAM3R_KEYFRAME_MOTION_THRESH |
0.4 |
Motion threshold |
STREAM3R_KEYFRAME_ROT_WEIGHT |
0.5 |
Rotation weight |
STREAM3R_KEYFRAME_MIN_GAIN |
0.01 |
Min coverage gain |
STREAM3R_KEYFRAME_FULL_MAX_FRAMES |
24 |
Switch to full attention when below |
Scene media API requirements:
STREAM3R_MEDIA_API_BASE_URLSTREAM3R_MEDIA_API_TOKENfor authenticated inserts
Ceiling trimming (optional) can be set per job via ceiling_percentile, ceiling_margin, or ceiling_z_max so downstream GLBs remain clean.
Integration Steps for External Services
Deploy Worker Queue
- Run the Stream3R worker with
--queue keyframe_selection(already default when env var set). - Ensure GPU not required: pre-pass uses Stream3R; CPU-only environments should set
STREAM3R_MODEL_DEVICE=cpuor schedule on GPU hosts.
- Run the Stream3R worker with
Enqueue Jobs
- Replace existing linear sampling code with an RQ enqueue call.
- Store job IDs if you need to poll job status or consume events.
Consume Results
- After completion, list
scene_mediaformedia_type=imageto retrieve new keyframe entries. - Inspect returned diagnostics for debugging or to render navigation overlays.
- After completion, list
Fallback Handling
- If the job fails, the queue returns error details; you can revert to your existing sampler.
- Consider scheduling a retry with adjusted parameters (e.g., lower
top_k).
Benefits vs. Linear Sampling
- Fewer redundant frames: motion-aware spacing ensures pose diversity.
- Better geometry coverage: only keeps frames that add new high-confidence voxels.
- Consistent diagnostics: each selected frame includes reasons and confidence, aiding QA.
- Automatic uploads: frames stored in Backblaze/local storage with
scene_mediaentries ready for viewers.
Future Enhancements
- Optional semantic filtering to avoid ceilings/walls.
- Exposure of thumbnails or depth maps alongside keyframes.
- Batch selection across multiple videos.
For questions or integration support, contact the Stream3R team or refer to stream3r/worker/keyframes.py for implementation details.