Spaces:

aayush226
/

Image-To-Video-Assignment

No application file

App Files Files Community

Image-To-Video-Assignment / README.md

aayush226

Add repo structure and schema description

7460c78 verified about 1 month ago

preview code

raw

history blame contribute delete

2.85 kB

	---
	title: Image To Video Assignment
	emoji: 📚
	colorFrom: gray
	colorTo: blue
	sdk: gradio
	sdk_version: 6.6.0
	app_file: app.py
	pinned: false
	---

	# Assignment 2: Image-to-Video Semantic Retrieval via Object Detection

	This repository contains the deliverables for Assignment 2.

	## Files
	- video_detections.parquet: car part detections indexed over the exterior only segment of the input video.
	- retrieval_clips.parquet: for each query image, the returned 2–3 second clip window (timestamps + clickable YouTube link).

	## Processing Summary
	- Video ID (corpus): `YcvECxtXoxQ`
	- Exterior segment used: `18:39` to `24:43`
	- Frame sampling: 1 fps
	- Detector: YOLOv8 segmentation fine-tuned on Ultralytics `carparts-seg`
	- Inference thresholds: confidence = 0.25, IoU = 0.5
	- Retrieval: detect top-K classes in query (K=2), match against indexed detections with temporal smoothing, output a fixed short clip (2–3 seconds).

	## Schema Description for the Parquet Files
	## video_detections.parquet schema
	Each row corresponds to a single detection in one sampled video frame.

	\| Column \| Type \| Description \|
	\|---\|---\|---\|
	\| video_id \| string \| YouTube video id \|
	\| frame_index \| int \| sampled frame index from extracted frames \|
	\| timestamp_sec \| int \| timestamp (seconds) in original YouTube video timeline \|
	\| class_id \| int \| detector class id \|
	\| class_label \| string \| detected car-part label \|
	\| x_min \| float \| bounding box left \|
	\| y_min \| float \| bounding box top \|
	\| x_max \| float \| bounding box right \|
	\| y_max \| float \| bounding box bottom \|
	\| bounding_box \| list[float] \| `[x_min, y_min, x_max, y_max]` \|
	\| confidence_score \| float \| detection confidence \|
	\| detector_name \| string \| model identifier \|

	## retrieval_clips.parquet schema
	Each row corresponds to one query image and the retrieved clip.

	\| Column \| Type \| Description \|
	\|---\|---\|---\|
	\| video_id \| string \| YouTube video id \|
	\| clip_id \| string \| unique clip id (one per query row) \|
	\| query_index \| int \| query image row index in the HF dataset \|
	\| query_timestamp_sec \| int \| timestamp metadata from the query dataset (not used for retrieval) \|
	\| classes_in_query \| list[string] \| top-K detected classes from the query image \|
	\| query_top_class_labels \| list[string] \| top-5 query class labels \|
	\| query_top_class_scores \| list[float] \| top-5 query class confidence scores \|
	\| classes_used_for_retrieval \| list[string] \| classes used for matching (intersection or fallback) \|
	\| strategy \| string \| retrieval strategy used \|
	\| start_timestamp \| int \| returned clip start time (seconds) \|
	\| end_timestamp \| int \| returned clip end time (seconds) \|
	\| number_of_supporting_detections \| int \| support count from the matched segment before selecting the short clip \|
	\| youtube_embed_url \| string \| clickable YouTube embed link with start/end \|

	---
	title: Image To Video Assignment
	emoji: 📚
	colorFrom: gray
	colorTo: blue
	sdk: gradio
	sdk_version: 6.6.0
	app_file: app.py
	pinned: false
	---

	# Assignment 2: Image-to-Video Semantic Retrieval via Object Detection

	This repository contains the deliverables for Assignment 2.

	## Files
	- video_detections.parquet: car part detections indexed over the exterior only segment of the input video.
	- retrieval_clips.parquet: for each query image, the returned 2–3 second clip window (timestamps + clickable YouTube link).

	## Processing Summary
	- Video ID (corpus): `YcvECxtXoxQ`
	- Exterior segment used: `18:39` to `24:43`
	- Frame sampling: 1 fps
	- Detector: YOLOv8 segmentation fine-tuned on Ultralytics `carparts-seg`
	- Inference thresholds: confidence = 0.25, IoU = 0.5
	- Retrieval: detect top-K classes in query (K=2), match against indexed detections with temporal smoothing, output a fixed short clip (2–3 seconds).

	## Schema Description for the Parquet Files
	## video_detections.parquet schema
	Each row corresponds to a single detection in one sampled video frame.

	\| Column \| Type \| Description \|
	\|---\|---\|---\|
	\| video_id \| string \| YouTube video id \|
	\| frame_index \| int \| sampled frame index from extracted frames \|
	\| timestamp_sec \| int \| timestamp (seconds) in original YouTube video timeline \|
	\| class_id \| int \| detector class id \|
	\| class_label \| string \| detected car-part label \|
	\| x_min \| float \| bounding box left \|
	\| y_min \| float \| bounding box top \|
	\| x_max \| float \| bounding box right \|
	\| y_max \| float \| bounding box bottom \|
	\| bounding_box \| list[float] \| `[x_min, y_min, x_max, y_max]` \|
	\| confidence_score \| float \| detection confidence \|
	\| detector_name \| string \| model identifier \|

	## retrieval_clips.parquet schema
	Each row corresponds to one query image and the retrieved clip.

	\| Column \| Type \| Description \|
	\|---\|---\|---\|
	\| video_id \| string \| YouTube video id \|
	\| clip_id \| string \| unique clip id (one per query row) \|
	\| query_index \| int \| query image row index in the HF dataset \|
	\| query_timestamp_sec \| int \| timestamp metadata from the query dataset (not used for retrieval) \|
	\| classes_in_query \| list[string] \| top-K detected classes from the query image \|
	\| query_top_class_labels \| list[string] \| top-5 query class labels \|
	\| query_top_class_scores \| list[float] \| top-5 query class confidence scores \|
	\| classes_used_for_retrieval \| list[string] \| classes used for matching (intersection or fallback) \|
	\| strategy \| string \| retrieval strategy used \|
	\| start_timestamp \| int \| returned clip start time (seconds) \|
	\| end_timestamp \| int \| returned clip end time (seconds) \|
	\| number_of_supporting_detections \| int \| support count from the matched segment before selecting the short clip \|
	\| youtube_embed_url \| string \| clickable YouTube embed link with start/end \|