Superxixixi
/

LoCoNet_ASD

Feature Extraction

Model card Files Files and versions

LoCoNet_ASD / videoloaders /README.md

xiziwang

push files

2e36228 over 2 years ago

|

history blame contribute delete

1.18 kB

	# How to process video as data loader

	We assume that video is preprocessed in to image files in advance. Usually, we do not use all frames in a clip but sample a certain duration (e.g. 16 frames). The pipline we assume for each chunk is the following.

	- Get a list of images paths of clips e.g. ["./video/clip1/frame0.jpg",...,"./video/clip1/frame101.jpg"]
	- Sample a certain duration we want to use e.g. ["./video/clip1/frame11.jpg",...,"./video/clip1/frame26.jpg"]
	- Load each frames into a tensor shaped as (T, H, W, C). HW can be changed later.
	- Use torchvision builtin utilities to crop, flip, etc. For example,
	- ToTensorVideo() from (T, H, W, C) to (C, T, H, W)), from 0-255 to 0-1 (devide by 225), and from uint8 to float.
	- CenterCropVideo
	- RandomHorizontalFlipVideo
	- NormalizeVideo with kinetics mean and std
	-See more https://github.com/pytorch/vision/blob/f0d3daa7f65bcde560e242d9bccc284721368f02/torchvision/transforms/transforms_video.py

	Note that the first part is different from what official pytorch repository ( https://github.com/pytorch/vision/tree/master/references/video_classification ) does. We don't use VideoClip class.