Spaces:

aayush226
/

Image-To-Video-Assignment

No application file

File size: 2,848 Bytes

---
title: Image To Video Assignment
emoji: 📚
colorFrom: gray
colorTo: blue
sdk: gradio
sdk_version: 6.6.0
app_file: app.py
pinned: false
---

# Assignment 2: Image-to-Video Semantic Retrieval via Object Detection

This repository contains the deliverables for Assignment 2.

## Files
- **video_detections.parquet**: car part detections indexed over the exterior only segment of the input video.
- **retrieval_clips.parquet**: for each query image, the returned 2–3 second clip window (timestamps + clickable YouTube link).

## Processing Summary
- **Video ID (corpus)**: `YcvECxtXoxQ`
- **Exterior segment used**: `18:39` to `24:43`
- **Frame sampling**: 1 fps
- **Detector**: YOLOv8 segmentation fine-tuned on Ultralytics `carparts-seg`
- **Inference thresholds**: confidence = 0.25, IoU = 0.5
- **Retrieval**: detect top-K classes in query (K=2), match against indexed detections with temporal smoothing, output a fixed short clip (2–3 seconds).

## Schema Description for the Parquet Files
## video_detections.parquet schema
Each row corresponds to a single detection in one sampled video frame.

| Column | Type | Description |
|---|---|---|
| video_id | string | YouTube video id |
| frame_index | int | sampled frame index from extracted frames |
| timestamp_sec | int | timestamp (seconds) in original YouTube video timeline |
| class_id | int | detector class id |
| class_label | string | detected car-part label |
| x_min | float | bounding box left |
| y_min | float | bounding box top |
| x_max | float | bounding box right |
| y_max | float | bounding box bottom |
| bounding_box | list[float] | `[x_min, y_min, x_max, y_max]` |
| confidence_score | float | detection confidence |
| detector_name | string | model identifier |

## retrieval_clips.parquet schema
Each row corresponds to one query image and the retrieved clip.

| Column | Type | Description |
|---|---|---|
| video_id | string | YouTube video id |
| clip_id | string | unique clip id (one per query row) |
| query_index | int | query image row index in the HF dataset |
| query_timestamp_sec | int | timestamp metadata from the query dataset (not used for retrieval) |
| classes_in_query | list[string] | top-K detected classes from the query image |
| query_top_class_labels | list[string] | top-5 query class labels |
| query_top_class_scores | list[float] | top-5 query class confidence scores |
| classes_used_for_retrieval | list[string] | classes used for matching (intersection or fallback) |
| strategy | string | retrieval strategy used |
| start_timestamp | int | returned clip start time (seconds) |
| end_timestamp | int | returned clip end time (seconds) |
| number_of_supporting_detections | int | support count from the matched segment before selecting the short clip |
| youtube_embed_url | string | clickable YouTube embed link with start/end |