cheenchan's picture
Deploy frame extraction matcher
fad2ba6

A newer version of the Gradio SDK is available: 6.2.0

Upgrade

Frame Extraction & Character Matching

This package turns raw video into character reference catalogs and lets you match new frames against those references. It is designed to be deployed quickly (e.g., on Hugging Face Spaces) for interactive character discovery.

Features

  • Shot-aware frame sampling to keep only useful stills.
  • Face detection, embedding, and clustering (MTCNN + InceptionResnet).
  • Automatic reference selection per character (sharpest, most frontal crop).
  • JSON catalog output and optional reference thumbnails.
  • Matching API/CLI for user-uploaded frames with multi-character support.
  • Gradio app template ready for Hugging Face hosting.

Install

cd projects/UMO-Qwen-Edit/data_curation_scripts/frame_extraction
pip install -e .

CLI Usage

Build a catalog from a video

frame-catalog catalog \
  --video-path data/source.mp4 \
  --output-dir outputs/catalog \
  --frame-interval 12 \
  --min-track-length 5

Match new frames against the catalog

frame-catalog match \
  --catalog-path outputs/catalog/catalog.json \
  --frames-dir uploads/ \
  --output-path outputs/matches.json

Deploy on Hugging Face Spaces

  1. Copy this folder to a new Space (Python SDK).
  2. Install dependencies with pip install -e ..
  3. Upload a pre-built catalog/catalog.json plus the references/ images.
  4. Set environment variables in the Space:
    • FRAME_CATALOG=/home/user/app/catalog/catalog.json
    • FRAME_OUTPUT_DIR=/home/user/app/output
  5. Set the Space entrypoint to python -m frame_extraction.app.

Outputs

  • catalog.json: character reference metadata with embeddings and chosen frames.
  • references/: cropped reference images per character.
  • matches.json: mapping from user frames to character IDs with similarity scores.

Roadmap

  • Integrate more robust trackers (DeepSort/ByteTrack).
  • Add active learning loop for manual character corrections.
  • Expose REST endpoints for automated ingestion.