Spaces:

aayush226
/

Image-To-Video-Assignment

No application file

Add repo structure and schema description

7460c78 verified about 1 month ago

2.85 kB

A newer version of the Gradio SDK is available: 6.10.0

title: Image To Video Assignment
emoji: 📚
colorFrom: gray
colorTo: blue
sdk: gradio
sdk_version: 6.6.0
app_file: app.py
pinned: false

Assignment 2: Image-to-Video Semantic Retrieval via Object Detection

This repository contains the deliverables for Assignment 2.

video_detections.parquet: car part detections indexed over the exterior only segment of the input video.
retrieval_clips.parquet: for each query image, the returned 2–3 second clip window (timestamps + clickable YouTube link).

Video ID (corpus): YcvECxtXoxQ
Exterior segment used: 18:39 to 24:43
Frame sampling: 1 fps
Detector: YOLOv8 segmentation fine-tuned on Ultralytics carparts-seg
Inference thresholds: confidence = 0.25, IoU = 0.5
Retrieval: detect top-K classes in query (K=2), match against indexed detections with temporal smoothing, output a fixed short clip (2–3 seconds).

Each row corresponds to a single detection in one sampled video frame.

Column	Type	Description
video_id	string	YouTube video id
frame_index	int	sampled frame index from extracted frames
timestamp_sec	int	timestamp (seconds) in original YouTube video timeline
class_id	int	detector class id
class_label	string	detected car-part label
x_min	float	bounding box left
y_min	float	bounding box top
x_max	float	bounding box right
y_max	float	bounding box bottom
bounding_box	list[float]	`[x_min, y_min, x_max, y_max]`
confidence_score	float	detection confidence
detector_name	string	model identifier

Each row corresponds to one query image and the retrieved clip.

Column	Type	Description
video_id	string	YouTube video id
clip_id	string	unique clip id (one per query row)
query_index	int	query image row index in the HF dataset
query_timestamp_sec	int	timestamp metadata from the query dataset (not used for retrieval)
classes_in_query	list[string]	top-K detected classes from the query image
query_top_class_labels	list[string]	top-5 query class labels
query_top_class_scores	list[float]	top-5 query class confidence scores
classes_used_for_retrieval	list[string]	classes used for matching (intersection or fallback)
strategy	string	retrieval strategy used
start_timestamp	int	returned clip start time (seconds)
end_timestamp	int	returned clip end time (seconds)
number_of_supporting_detections	int	support count from the matched segment before selecting the short clip
youtube_embed_url	string	clickable YouTube embed link with start/end