aayush226's picture
Add repo structure and schema description
7460c78 verified

A newer version of the Gradio SDK is available: 6.10.0

Upgrade
metadata
title: Image To Video Assignment
emoji: ๐Ÿ“š
colorFrom: gray
colorTo: blue
sdk: gradio
sdk_version: 6.6.0
app_file: app.py
pinned: false

Assignment 2: Image-to-Video Semantic Retrieval via Object Detection

This repository contains the deliverables for Assignment 2.

Files

  • video_detections.parquet: car part detections indexed over the exterior only segment of the input video.
  • retrieval_clips.parquet: for each query image, the returned 2โ€“3 second clip window (timestamps + clickable YouTube link).

Processing Summary

  • Video ID (corpus): YcvECxtXoxQ
  • Exterior segment used: 18:39 to 24:43
  • Frame sampling: 1 fps
  • Detector: YOLOv8 segmentation fine-tuned on Ultralytics carparts-seg
  • Inference thresholds: confidence = 0.25, IoU = 0.5
  • Retrieval: detect top-K classes in query (K=2), match against indexed detections with temporal smoothing, output a fixed short clip (2โ€“3 seconds).

Schema Description for the Parquet Files

video_detections.parquet schema

Each row corresponds to a single detection in one sampled video frame.

Column Type Description
video_id string YouTube video id
frame_index int sampled frame index from extracted frames
timestamp_sec int timestamp (seconds) in original YouTube video timeline
class_id int detector class id
class_label string detected car-part label
x_min float bounding box left
y_min float bounding box top
x_max float bounding box right
y_max float bounding box bottom
bounding_box list[float] [x_min, y_min, x_max, y_max]
confidence_score float detection confidence
detector_name string model identifier

retrieval_clips.parquet schema

Each row corresponds to one query image and the retrieved clip.

Column Type Description
video_id string YouTube video id
clip_id string unique clip id (one per query row)
query_index int query image row index in the HF dataset
query_timestamp_sec int timestamp metadata from the query dataset (not used for retrieval)
classes_in_query list[string] top-K detected classes from the query image
query_top_class_labels list[string] top-5 query class labels
query_top_class_scores list[float] top-5 query class confidence scores
classes_used_for_retrieval list[string] classes used for matching (intersection or fallback)
strategy string retrieval strategy used
start_timestamp int returned clip start time (seconds)
end_timestamp int returned clip end time (seconds)
number_of_supporting_detections int support count from the matched segment before selecting the short clip
youtube_embed_url string clickable YouTube embed link with start/end