dwellbot_stream3r / design_docs /keyframe_service.md
brian4dwell's picture
split key framer out
01e8928
# Keyframe Selection Service

**Author:** Brian Clark  
**Last Updated:** 2025-11-07  
**Audience:** Integrators needing motion-aware keyframes instead of linear sampling

---

## Overview

The keyframe selection worker ingests a raw video (Backblaze/S3 key), extracts frames, runs motion + coverage analysis, and uploads the chosen keyframes back to the media store while recording them in the `scene_media` table. Unlike linear FPS sampling, it keeps only the most informative views (~10–20 frames for a typical 30 s scan).

The workflow is exposed as an RQ job (`keyframe_selection`). Another service (e.g., scene-graph-manager) can enqueue this job instead of doing its own downsampling.

---

## Job Inputs

| Field | Required | Description |
|-------|----------|-------------|
| `scene_id` | ✅ | Scene identifier used in `scene_media` |
| `video_key` | ✅ | Storage key (Backblaze/S3) for the source video |
| `top_k` | optional | Desired maximum keyframes; defaults to `STREAM3R_KEYFRAME_TOP_K` |
| `extract_fps` | optional | Override extraction FPS (default `STREAM3R_KEYFRAME_EXTRACT_FPS`) |
| `extract_max_frames` | optional | Cap on total decoded frames (default `STREAM3R_KEYFRAME_EXTRACT_MAX_FRAMES`) |
| Optional filters | e.g., `ceiling_percentile`, `ceiling_z_max` for downstream GLB |

Example payload:

```json
{
  "job_type": "keyframe_selection",
  "scene_id": "scene-123",
  "video_key": "scene-data/videos/scene-123.mp4",
  "top_k": 16,
  "extract_fps": 6.0
}

Enqueue via RQ:

from rq import Queue
from redis import Redis

queue = Queue("keyframe_selection", connection=Redis.from_url("redis://"))
queue.enqueue("worker.stream3r.jobs.handle_job", {
    "job_type": "keyframe_selection",
    "scene_id": "scene-123",
    "video_key": "scene-data/videos/scene-123.mp4"
})

Processing Pipeline

  1. Download & Extract Frames

    • Video pulled via runtime.storage.download_to_path.
    • Frames decoded with OpenCV at extract_fps (default 6 fps) up to extract_max_frames.
  2. Motion + Coverage Pre-pass

    • Lightweight Stream3R windowed inference run to collect poses (extrinsic) and confidences.
    • Motion scoring: translation + weighted rotation deltas, greedily ensures pose diversity.
    • Coverage scoring: counts high-confidence voxel IDs contributed per frame, greedily maximizes new coverage.
    • Diagnostics stored per frame (reason, motion delta, coverage gain, confidence).
    • If inference fails, fallback to linear sampling.
  3. Selection & Upload

    • Selected frames copied; images uploaded via runtime.storage.upload_file under keyframe_upload_dir (default keyframes).
    • Scene media rows inserted through record_scene_media_entries API.
    • Manifest (selected_frames) returned with storage keys and diagnostics.
  4. Result

    • Job metadata includes: native video FPS, total extracted frames, selected frame details, and diagnostics.
    • selected_frames.json (optional) can be stored by downstream jobs for auditing.
    • Scene-media registration is attempted via the configured API; if the endpoint does not accept POST (e.g., legacy deployments), the worker logs and skips registration without failing the job.

Outputs & Diagnostics

Result payload (simplified):

{
  "job_id": "...",
  "scene_id": "scene-123",
  "video_key": "scene-data/videos/scene-123.mp4",
  "native_fps": 29.97,
  "total_frames": 420,
  "selected_frames": [
    {
      "frame_id": "frame_000012",
      "frame_index": 12,
      "url": "s3://bucket/scenes/scene-123/keyframes/frame_000012.jpg",
      "storage_key": "scenes/scene-123/keyframes/frame_000012.jpg",
      "diagnostics": {
        "reason": "motion",
        "motion_delta": 0.42,
        "coverage_gain_ratio": 0.08,
        "mean_confidence": 0.67
      }
    }
  ]
}

Each uploaded frame is also inserted/updated in the scene_media table via the Scene Graph API (/scenes/{scene_id}/media).


Configuration

Environment variables for fine-tuning:

Env Var Default Notes
STREAM3R_QUEUE_KEYFRAME keyframe_selection RQ queue name
STREAM3R_KEYFRAME_EXTRACT_FPS 6.0 Extraction FPS
STREAM3R_KEYFRAME_EXTRACT_MAX_FRAMES 1200 Extraction cap
STREAM3R_KEYFRAME_UPLOAD_DIR keyframes Storage subdirectory
STREAM3R_KEYFRAME_TOP_K 16 Default selection budget
STREAM3R_KEYFRAME_PREPASS 1 Enable motion/coverage inference
STREAM3R_KEYFRAME_MOTION_THRESH 0.4 Motion threshold
STREAM3R_KEYFRAME_ROT_WEIGHT 0.5 Rotation weight
STREAM3R_KEYFRAME_MIN_GAIN 0.01 Min coverage gain
STREAM3R_KEYFRAME_FULL_MAX_FRAMES 24 Switch to full attention when below

Scene media API requirements:

  • STREAM3R_MEDIA_API_BASE_URL
  • STREAM3R_MEDIA_API_TOKEN for authenticated inserts

Ceiling trimming (optional) can be set per job via ceiling_percentile, ceiling_margin, or ceiling_z_max so downstream GLBs remain clean.


Integration Steps for External Services

  1. Deploy Worker Queue

    • Run the Stream3R worker with --queue keyframe_selection (already default when env var set).
    • Ensure GPU not required: pre-pass uses Stream3R; CPU-only environments should set STREAM3R_MODEL_DEVICE=cpu or schedule on GPU hosts.
  2. Enqueue Jobs

    • Replace existing linear sampling code with an RQ enqueue call.
    • Store job IDs if you need to poll job status or consume events.
  3. Consume Results

    • After completion, list scene_media for media_type=image to retrieve new keyframe entries.
    • Inspect returned diagnostics for debugging or to render navigation overlays.
  4. Fallback Handling

    • If the job fails, the queue returns error details; you can revert to your existing sampler.
    • Consider scheduling a retry with adjusted parameters (e.g., lower top_k).

Benefits vs. Linear Sampling

  • Fewer redundant frames: motion-aware spacing ensures pose diversity.
  • Better geometry coverage: only keeps frames that add new high-confidence voxels.
  • Consistent diagnostics: each selected frame includes reasons and confidence, aiding QA.
  • Automatic uploads: frames stored in Backblaze/local storage with scene_media entries ready for viewers.

Future Enhancements

  • Optional semantic filtering to avoid ceilings/walls.
  • Exposure of thumbnails or depth maps alongside keyframes.
  • Batch selection across multiple videos.

For questions or integration support, contact the Stream3R team or refer to stream3r/worker/keyframes.py for implementation details.