Spaces:
No application file
No application file
A newer version of the Gradio SDK is available: 6.10.0
metadata
title: Image To Video Assignment
emoji: ๐
colorFrom: gray
colorTo: blue
sdk: gradio
sdk_version: 6.6.0
app_file: app.py
pinned: false
Assignment 2: Image-to-Video Semantic Retrieval via Object Detection
This repository contains the deliverables for Assignment 2.
Files
- video_detections.parquet: car part detections indexed over the exterior only segment of the input video.
- retrieval_clips.parquet: for each query image, the returned 2โ3 second clip window (timestamps + clickable YouTube link).
Processing Summary
- Video ID (corpus):
YcvECxtXoxQ - Exterior segment used:
18:39to24:43 - Frame sampling: 1 fps
- Detector: YOLOv8 segmentation fine-tuned on Ultralytics
carparts-seg - Inference thresholds: confidence = 0.25, IoU = 0.5
- Retrieval: detect top-K classes in query (K=2), match against indexed detections with temporal smoothing, output a fixed short clip (2โ3 seconds).
Schema Description for the Parquet Files
video_detections.parquet schema
Each row corresponds to a single detection in one sampled video frame.
| Column | Type | Description |
|---|---|---|
| video_id | string | YouTube video id |
| frame_index | int | sampled frame index from extracted frames |
| timestamp_sec | int | timestamp (seconds) in original YouTube video timeline |
| class_id | int | detector class id |
| class_label | string | detected car-part label |
| x_min | float | bounding box left |
| y_min | float | bounding box top |
| x_max | float | bounding box right |
| y_max | float | bounding box bottom |
| bounding_box | list[float] | [x_min, y_min, x_max, y_max] |
| confidence_score | float | detection confidence |
| detector_name | string | model identifier |
retrieval_clips.parquet schema
Each row corresponds to one query image and the retrieved clip.
| Column | Type | Description |
|---|---|---|
| video_id | string | YouTube video id |
| clip_id | string | unique clip id (one per query row) |
| query_index | int | query image row index in the HF dataset |
| query_timestamp_sec | int | timestamp metadata from the query dataset (not used for retrieval) |
| classes_in_query | list[string] | top-K detected classes from the query image |
| query_top_class_labels | list[string] | top-5 query class labels |
| query_top_class_scores | list[float] | top-5 query class confidence scores |
| classes_used_for_retrieval | list[string] | classes used for matching (intersection or fallback) |
| strategy | string | retrieval strategy used |
| start_timestamp | int | returned clip start time (seconds) |
| end_timestamp | int | returned clip end time (seconds) |
| number_of_supporting_detections | int | support count from the matched segment before selecting the short clip |
| youtube_embed_url | string | clickable YouTube embed link with start/end |