Spaces:
No application file
No application file
| title: Image To Video Assignment | |
| emoji: π | |
| colorFrom: gray | |
| colorTo: blue | |
| sdk: gradio | |
| sdk_version: 6.6.0 | |
| app_file: app.py | |
| pinned: false | |
| # Assignment 2: Image-to-Video Semantic Retrieval via Object Detection | |
| This repository contains the deliverables for Assignment 2. | |
| ## Files | |
| - **video_detections.parquet**: car part detections indexed over the exterior only segment of the input video. | |
| - **retrieval_clips.parquet**: for each query image, the returned 2β3 second clip window (timestamps + clickable YouTube link). | |
| ## Processing Summary | |
| - **Video ID (corpus)**: `YcvECxtXoxQ` | |
| - **Exterior segment used**: `18:39` to `24:43` | |
| - **Frame sampling**: 1 fps | |
| - **Detector**: YOLOv8 segmentation fine-tuned on Ultralytics `carparts-seg` | |
| - **Inference thresholds**: confidence = 0.25, IoU = 0.5 | |
| - **Retrieval**: detect top-K classes in query (K=2), match against indexed detections with temporal smoothing, output a fixed short clip (2β3 seconds). | |
| ## Schema Description for the Parquet Files | |
| ## video_detections.parquet schema | |
| Each row corresponds to a single detection in one sampled video frame. | |
| | Column | Type | Description | | |
| |---|---|---| | |
| | video_id | string | YouTube video id | | |
| | frame_index | int | sampled frame index from extracted frames | | |
| | timestamp_sec | int | timestamp (seconds) in original YouTube video timeline | | |
| | class_id | int | detector class id | | |
| | class_label | string | detected car-part label | | |
| | x_min | float | bounding box left | | |
| | y_min | float | bounding box top | | |
| | x_max | float | bounding box right | | |
| | y_max | float | bounding box bottom | | |
| | bounding_box | list[float] | `[x_min, y_min, x_max, y_max]` | | |
| | confidence_score | float | detection confidence | | |
| | detector_name | string | model identifier | | |
| ## retrieval_clips.parquet schema | |
| Each row corresponds to one query image and the retrieved clip. | |
| | Column | Type | Description | | |
| |---|---|---| | |
| | video_id | string | YouTube video id | | |
| | clip_id | string | unique clip id (one per query row) | | |
| | query_index | int | query image row index in the HF dataset | | |
| | query_timestamp_sec | int | timestamp metadata from the query dataset (not used for retrieval) | | |
| | classes_in_query | list[string] | top-K detected classes from the query image | | |
| | query_top_class_labels | list[string] | top-5 query class labels | | |
| | query_top_class_scores | list[float] | top-5 query class confidence scores | | |
| | classes_used_for_retrieval | list[string] | classes used for matching (intersection or fallback) | | |
| | strategy | string | retrieval strategy used | | |
| | start_timestamp | int | returned clip start time (seconds) | | |
| | end_timestamp | int | returned clip end time (seconds) | | |
| | number_of_supporting_detections | int | support count from the matched segment before selecting the short clip | | |
| | youtube_embed_url | string | clickable YouTube embed link with start/end | |