Spaces:

dwellbot
/

dwellbot_stream3r

Configuration error

App Files Files Community

dwellbot_stream3r / design_docs /keyframe_service.md

brian4dwell

split key framer out

01e8928 2 months ago

preview code

raw

history blame contribute delete

6.67 kB

	```markdown
	# Keyframe Selection Service

	Author: Brian Clark
	Last Updated: 2025-11-07
	Audience: Integrators needing motion-aware keyframes instead of linear sampling

	---

	## Overview

	The keyframe selection worker ingests a raw video (Backblaze/S3 key), extracts frames, runs motion + coverage analysis, and uploads the chosen keyframes back to the media store while recording them in the `scene_media` table. Unlike linear FPS sampling, it keeps only the most informative views (~10–20 frames for a typical 30 s scan).

	The workflow is exposed as an RQ job (`keyframe_selection`). Another service (e.g., scene-graph-manager) can enqueue this job instead of doing its own downsampling.

	---

	## Job Inputs

	\| Field \| Required \| Description \|
	\|-------\|----------\|-------------\|
	\| `scene_id` \| ✅ \| Scene identifier used in `scene_media` \|
	\| `video_key` \| ✅ \| Storage key (Backblaze/S3) for the source video \|
	\| `top_k` \| optional \| Desired maximum keyframes; defaults to `STREAM3R_KEYFRAME_TOP_K` \|
	\| `extract_fps` \| optional \| Override extraction FPS (default `STREAM3R_KEYFRAME_EXTRACT_FPS`) \|
	\| `extract_max_frames` \| optional \| Cap on total decoded frames (default `STREAM3R_KEYFRAME_EXTRACT_MAX_FRAMES`) \|
	\| Optional filters \| e.g., `ceiling_percentile`, `ceiling_z_max` for downstream GLB \|

	Example payload:

	```json
	{
	"job_type": "keyframe_selection",
	"scene_id": "scene-123",
	"video_key": "scene-data/videos/scene-123.mp4",
	"top_k": 16,
	"extract_fps": 6.0
	}
	```

	Enqueue via RQ:

	```python
	from rq import Queue
	from redis import Redis

	queue = Queue("keyframe_selection", connection=Redis.from_url("redis://"))
	queue.enqueue("worker.stream3r.jobs.handle_job", {
	"job_type": "keyframe_selection",
	"scene_id": "scene-123",
	"video_key": "scene-data/videos/scene-123.mp4"
	})
	```

	---

	## Processing Pipeline

	1. Download & Extract Frames
	- Video pulled via `runtime.storage.download_to_path`.
	- Frames decoded with OpenCV at `extract_fps` (default 6 fps) up to `extract_max_frames`.

	2. Motion + Coverage Pre-pass
	- Lightweight Stream3R windowed inference run to collect poses (`extrinsic`) and confidences.
	- Motion scoring: translation + weighted rotation deltas, greedily ensures pose diversity.
	- Coverage scoring: counts high-confidence voxel IDs contributed per frame, greedily maximizes new coverage.
	- Diagnostics stored per frame (reason, motion delta, coverage gain, confidence).
	- If inference fails, fallback to linear sampling.

	3. Selection & Upload
	- Selected frames copied; images uploaded via `runtime.storage.upload_file` under `keyframe_upload_dir` (default `keyframes`).
	- Scene media rows inserted through `record_scene_media_entries` API.
	- Manifest (`selected_frames`) returned with storage keys and diagnostics.

	4. Result
	- Job metadata includes: native video FPS, total extracted frames, selected frame details, and diagnostics.
	- `selected_frames.json` (optional) can be stored by downstream jobs for auditing.
	- Scene-media registration is attempted via the configured API; if the endpoint does not accept POST (e.g., legacy deployments), the worker logs and skips registration without failing the job.

	---

	## Outputs & Diagnostics

	Result payload (simplified):

	```json
	{
	"job_id": "...",
	"scene_id": "scene-123",
	"video_key": "scene-data/videos/scene-123.mp4",
	"native_fps": 29.97,
	"total_frames": 420,
	"selected_frames": [
	{
	"frame_id": "frame_000012",
	"frame_index": 12,
	"url": "s3://bucket/scenes/scene-123/keyframes/frame_000012.jpg",
	"storage_key": "scenes/scene-123/keyframes/frame_000012.jpg",
	"diagnostics": {
	"reason": "motion",
	"motion_delta": 0.42,
	"coverage_gain_ratio": 0.08,
	"mean_confidence": 0.67
	}
	}
	]
	}
	```

	Each uploaded frame is also inserted/updated in the `scene_media` table via the Scene Graph API (`/scenes/{scene_id}/media`).

	---

	## Configuration

	Environment variables for fine-tuning:

	\| Env Var \| Default \| Notes \|
	\|---------\|---------\|-------\|
	\| `STREAM3R_QUEUE_KEYFRAME` \| `keyframe_selection` \| RQ queue name \|
	\| `STREAM3R_KEYFRAME_EXTRACT_FPS` \| `6.0` \| Extraction FPS \|
	\| `STREAM3R_KEYFRAME_EXTRACT_MAX_FRAMES` \| `1200` \| Extraction cap \|
	\| `STREAM3R_KEYFRAME_UPLOAD_DIR` \| `keyframes` \| Storage subdirectory \|
	\| `STREAM3R_KEYFRAME_TOP_K` \| `16` \| Default selection budget \|
	\| `STREAM3R_KEYFRAME_PREPASS` \| `1` \| Enable motion/coverage inference \|
	\| `STREAM3R_KEYFRAME_MOTION_THRESH` \| `0.4` \| Motion threshold \|
	\| `STREAM3R_KEYFRAME_ROT_WEIGHT` \| `0.5` \| Rotation weight \|
	\| `STREAM3R_KEYFRAME_MIN_GAIN` \| `0.01` \| Min coverage gain \|
	\| `STREAM3R_KEYFRAME_FULL_MAX_FRAMES` \| `24` \| Switch to full attention when below \|

	Scene media API requirements:
	- `STREAM3R_MEDIA_API_BASE_URL`
	- `STREAM3R_MEDIA_API_TOKEN` for authenticated inserts

	Ceiling trimming (optional) can be set per job via `ceiling_percentile`, `ceiling_margin`, or `ceiling_z_max` so downstream GLBs remain clean.

	---

	## Integration Steps for External Services

	1. Deploy Worker Queue
	- Run the Stream3R worker with `--queue keyframe_selection` (already default when env var set).
	- Ensure GPU not required: pre-pass uses Stream3R; CPU-only environments should set `STREAM3R_MODEL_DEVICE=cpu` or schedule on GPU hosts.

	2. Enqueue Jobs
	- Replace existing linear sampling code with an RQ enqueue call.
	- Store job IDs if you need to poll job status or consume events.

	3. Consume Results
	- After completion, list `scene_media` for `media_type=image` to retrieve new keyframe entries.
	- Inspect returned diagnostics for debugging or to render navigation overlays.

	4. Fallback Handling
	- If the job fails, the queue returns error details; you can revert to your existing sampler.
	- Consider scheduling a retry with adjusted parameters (e.g., lower `top_k`).

	---

	## Benefits vs. Linear Sampling

	- Fewer redundant frames: motion-aware spacing ensures pose diversity.
	- Better geometry coverage: only keeps frames that add new high-confidence voxels.
	- Consistent diagnostics: each selected frame includes reasons and confidence, aiding QA.
	- Automatic uploads: frames stored in Backblaze/local storage with `scene_media` entries ready for viewers.

	---

	## Future Enhancements

	- Optional semantic filtering to avoid ceilings/walls.
	- Exposure of thumbnails or depth maps alongside keyframes.
	- Batch selection across multiple videos.

	---

	For questions or integration support, contact the Stream3R team or refer to `stream3r/worker/keyframes.py` for implementation details.

	```