--- title: Image To Video Assignment emoji: 📚 colorFrom: gray colorTo: blue sdk: gradio sdk_version: 6.6.0 app_file: app.py pinned: false --- # Assignment 2: Image-to-Video Semantic Retrieval via Object Detection This repository contains the deliverables for Assignment 2. ## Files - **video_detections.parquet**: car part detections indexed over the exterior only segment of the input video. - **retrieval_clips.parquet**: for each query image, the returned 2–3 second clip window (timestamps + clickable YouTube link). ## Processing Summary - **Video ID (corpus)**: `YcvECxtXoxQ` - **Exterior segment used**: `18:39` to `24:43` - **Frame sampling**: 1 fps - **Detector**: YOLOv8 segmentation fine-tuned on Ultralytics `carparts-seg` - **Inference thresholds**: confidence = 0.25, IoU = 0.5 - **Retrieval**: detect top-K classes in query (K=2), match against indexed detections with temporal smoothing, output a fixed short clip (2–3 seconds). ## Schema Description for the Parquet Files ## video_detections.parquet schema Each row corresponds to a single detection in one sampled video frame. | Column | Type | Description | |---|---|---| | video_id | string | YouTube video id | | frame_index | int | sampled frame index from extracted frames | | timestamp_sec | int | timestamp (seconds) in original YouTube video timeline | | class_id | int | detector class id | | class_label | string | detected car-part label | | x_min | float | bounding box left | | y_min | float | bounding box top | | x_max | float | bounding box right | | y_max | float | bounding box bottom | | bounding_box | list[float] | `[x_min, y_min, x_max, y_max]` | | confidence_score | float | detection confidence | | detector_name | string | model identifier | ## retrieval_clips.parquet schema Each row corresponds to one query image and the retrieved clip. | Column | Type | Description | |---|---|---| | video_id | string | YouTube video id | | clip_id | string | unique clip id (one per query row) | | query_index | int | query image row index in the HF dataset | | query_timestamp_sec | int | timestamp metadata from the query dataset (not used for retrieval) | | classes_in_query | list[string] | top-K detected classes from the query image | | query_top_class_labels | list[string] | top-5 query class labels | | query_top_class_scores | list[float] | top-5 query class confidence scores | | classes_used_for_retrieval | list[string] | classes used for matching (intersection or fallback) | | strategy | string | retrieval strategy used | | start_timestamp | int | returned clip start time (seconds) | | end_timestamp | int | returned clip end time (seconds) | | number_of_supporting_detections | int | support count from the matched segment before selecting the short clip | | youtube_embed_url | string | clickable YouTube embed link with start/end |