Spaces:
No application file
No application file
File size: 2,848 Bytes
c61b963 7460c78 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 | ---
title: Image To Video Assignment
emoji: ๐
colorFrom: gray
colorTo: blue
sdk: gradio
sdk_version: 6.6.0
app_file: app.py
pinned: false
---
# Assignment 2: Image-to-Video Semantic Retrieval via Object Detection
This repository contains the deliverables for Assignment 2.
## Files
- **video_detections.parquet**: car part detections indexed over the exterior only segment of the input video.
- **retrieval_clips.parquet**: for each query image, the returned 2โ3 second clip window (timestamps + clickable YouTube link).
## Processing Summary
- **Video ID (corpus)**: `YcvECxtXoxQ`
- **Exterior segment used**: `18:39` to `24:43`
- **Frame sampling**: 1 fps
- **Detector**: YOLOv8 segmentation fine-tuned on Ultralytics `carparts-seg`
- **Inference thresholds**: confidence = 0.25, IoU = 0.5
- **Retrieval**: detect top-K classes in query (K=2), match against indexed detections with temporal smoothing, output a fixed short clip (2โ3 seconds).
## Schema Description for the Parquet Files
## video_detections.parquet schema
Each row corresponds to a single detection in one sampled video frame.
| Column | Type | Description |
|---|---|---|
| video_id | string | YouTube video id |
| frame_index | int | sampled frame index from extracted frames |
| timestamp_sec | int | timestamp (seconds) in original YouTube video timeline |
| class_id | int | detector class id |
| class_label | string | detected car-part label |
| x_min | float | bounding box left |
| y_min | float | bounding box top |
| x_max | float | bounding box right |
| y_max | float | bounding box bottom |
| bounding_box | list[float] | `[x_min, y_min, x_max, y_max]` |
| confidence_score | float | detection confidence |
| detector_name | string | model identifier |
## retrieval_clips.parquet schema
Each row corresponds to one query image and the retrieved clip.
| Column | Type | Description |
|---|---|---|
| video_id | string | YouTube video id |
| clip_id | string | unique clip id (one per query row) |
| query_index | int | query image row index in the HF dataset |
| query_timestamp_sec | int | timestamp metadata from the query dataset (not used for retrieval) |
| classes_in_query | list[string] | top-K detected classes from the query image |
| query_top_class_labels | list[string] | top-5 query class labels |
| query_top_class_scores | list[float] | top-5 query class confidence scores |
| classes_used_for_retrieval | list[string] | classes used for matching (intersection or fallback) |
| strategy | string | retrieval strategy used |
| start_timestamp | int | returned clip start time (seconds) |
| end_timestamp | int | returned clip end time (seconds) |
| number_of_supporting_detections | int | support count from the matched segment before selecting the short clip |
| youtube_embed_url | string | clickable YouTube embed link with start/end | |