Spaces:
Running on Zero
Running on Zero
A newer version of the Gradio SDK is available: 6.19.0
metadata
title: TASKER Keyframe Extractor
emoji: 🔍
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 6.15.1
app_file: app.py
short_description: VLM-guided tree-search keyframe extraction from videos
python_version: '3.12'
startup_duration_timeout: 30m
TASKER Keyframe Extractor
This Space demonstrates TASKER (Task-driven and Scene-aware Keyframe searcher), a keyframe extraction algorithm from the ECCV 2026 paper Bridging VideoQA and Video-Guided Agentic Tasks via Generalized Keyframe Extraction.
How it works
TASKER reformulates keyframe extraction as a generalized graph-search problem:
- The input video is segmented into a tree of segments.
- A Vision-Language Model (Qwen2.5-VL-7B) evaluates which segments likely contain crucial missing actions.
- The selected segments are expanded (split at visual change points).
- Visual deduplication filters near-identical frames.
- The search terminates when the VLM is confident enough (confidence ≥ 3) or a frame limit is reached.
Four search strategies are available:
- A* (default): balances goal-relevance and visual state changes
- BFS: broad exploration, can select multiple segments per step
- GBFS: greedy best-first, focuses on goal-critical actions
- Dijkstra: focuses on maximum visual state transitions
Usage
- Upload a video file
- Enter a task query (e.g., "How to send an email with an attachment?")
- Select a search strategy
- Click "Extract Keyframes"
The model returns a gallery of keyframes with timestamps and frame indices.
Model
Uses Qwen/Qwen2.5-VL-7B-Instruct as the VLM for segment evaluation, running on ZeroGPU.