Upload folder using huggingface_hub

e13f5a4 verified about 1 year ago

11.9 kB

	## Dataset Configuration

	Please create a TOML file for dataset configuration.

	Image and video datasets are supported. The configuration file can include multiple datasets, either image or video datasets, with caption text files or metadata JSONL files.

	### Sample for Image Dataset with Caption Text Files

	```toml
	# resolution, caption_extension, batch_size, enable_bucket, bucket_no_upscale must be set in either general or datasets

	# general configurations
	[general]
	resolution = [960, 544]
	caption_extension = ".txt"
	batch_size = 1
	enable_bucket = true
	bucket_no_upscale = false

	[[datasets]]
	image_directory = "/path/to/image_dir"

	# other datasets can be added here. each dataset can have different configurations
	```

	### Sample for Image Dataset with Metadata JSONL File

	```toml
	# resolution, batch_size, enable_bucket, bucket_no_upscale must be set in either general or datasets
	# caption_extension is not required for metadata jsonl file
	# cache_directory is required for each dataset with metadata jsonl file

	# general configurations
	[general]
	resolution = [960, 544]
	batch_size = 1
	enable_bucket = true
	bucket_no_upscale = false

	[[datasets]]
	image_jsonl_file = "/path/to/metadata.jsonl"
	cache_directory = "/path/to/cache_directory"

	# other datasets can be added here. each dataset can have different configurations
	```

	JSONL file format for metadata:

	```json
	{"image_path": "/path/to/image1.jpg", "caption": "A caption for image1"}
	{"image_path": "/path/to/image2.jpg", "caption": "A caption for image2"}
	```

	### Sample for Video Dataset with Caption Text Files

	```toml
	# resolution, caption_extension, target_frames, frame_extraction, frame_stride, frame_sample, batch_size, enable_bucket, bucket_no_upscale must be set in either general or datasets

	# general configurations
	[general]
	resolution = [960, 544]
	caption_extension = ".txt"
	batch_size = 1
	enable_bucket = true
	bucket_no_upscale = false

	[[datasets]]
	video_directory = "/path/to/video_dir"
	target_frames = [1, 25, 45]
	frame_extraction = "head"

	# other datasets can be added here. each dataset can have different configurations
	```

	### Sample for Video Dataset with Metadata JSONL File

	```toml
	# resolution, target_frames, frame_extraction, frame_stride, frame_sample, batch_size, enable_bucket, bucket_no_upscale must be set in either general or datasets
	# caption_extension is not required for metadata jsonl file
	# cache_directory is required for each dataset with metadata jsonl file

	# general configurations
	[general]
	resolution = [960, 544]
	batch_size = 1
	enable_bucket = true
	bucket_no_upscale = false

	[[datasets]]
	video_jsonl_file = "/path/to/metadata.jsonl"
	target_frames = [1, 25, 45]
	frame_extraction = "head"
	cache_directory = "/path/to/cache_directory"

	# same metadata jsonl file can be used for multiple datasets
	[[datasets]]
	video_jsonl_file = "/path/to/metadata.jsonl"
	target_frames = [1]
	frame_stride = 10
	cache_directory = "/path/to/cache_directory"

	# other datasets can be added here. each dataset can have different configurations
	```

	JSONL file format for metadata:

	```json
	{"video_path": "/path/to/video1.mp4", "caption": "A caption for video1"}
	{"video_path": "/path/to/video2.mp4", "caption": "A caption for video2"}
	```

	### fame_extraction Options

	- `head`: Extract the first N frames from the video.
	- `chunk`: Extract frames by splitting the video into chunks of N frames.
	- `slide`: Extract frames from the video with a stride of `frame_stride`.
	- `uniform`: Extract `frame_sample` samples uniformly from the video.

	For example, consider a video with 40 frames. The following diagrams illustrate each extraction:

	```
	Original Video, 40 frames: x = frame, o = no frame
	oooooooooooooooooooooooooooooooooooooooo

	head, target_frames = [1, 13, 25] -> extract head frames:
	xooooooooooooooooooooooooooooooooooooooo
	xxxxxxxxxxxxxooooooooooooooooooooooooooo
	xxxxxxxxxxxxxxxxxxxxxxxxxooooooooooooooo

	chunk, target_frames = [13, 25] -> extract frames by splitting into chunks, into 13 and 25 frames:
	xxxxxxxxxxxxxooooooooooooooooooooooooooo
	oooooooooooooxxxxxxxxxxxxxoooooooooooooo
	ooooooooooooooooooooooooooxxxxxxxxxxxxxo
	xxxxxxxxxxxxxxxxxxxxxxxxxooooooooooooooo

	NOTE: Please do not include 1 in target_frames if you are using the frame_extraction "chunk". It will make the all frames to be extracted.

	slide, target_frames = [1, 13, 25], frame_stride = 10 -> extract N frames with a stride of 10:
	xooooooooooooooooooooooooooooooooooooooo
	ooooooooooxooooooooooooooooooooooooooooo
	ooooooooooooooooooooxooooooooooooooooooo
	ooooooooooooooooooooooooooooooxooooooooo
	xxxxxxxxxxxxxooooooooooooooooooooooooooo
	ooooooooooxxxxxxxxxxxxxooooooooooooooooo
	ooooooooooooooooooooxxxxxxxxxxxxxooooooo
	xxxxxxxxxxxxxxxxxxxxxxxxxooooooooooooooo
	ooooooooooxxxxxxxxxxxxxxxxxxxxxxxxxooooo

	uniform, target_frames =[1, 13, 25], frame_sample = 4 -> extract `frame_sample` samples uniformly, N frames each:
	xooooooooooooooooooooooooooooooooooooooo
	oooooooooooooxoooooooooooooooooooooooooo
	oooooooooooooooooooooooooxoooooooooooooo
	ooooooooooooooooooooooooooooooooooooooox
	xxxxxxxxxxxxxooooooooooooooooooooooooooo
	oooooooooxxxxxxxxxxxxxoooooooooooooooooo
	ooooooooooooooooooxxxxxxxxxxxxxooooooooo
	oooooooooooooooooooooooooooxxxxxxxxxxxxx
	xxxxxxxxxxxxxxxxxxxxxxxxxooooooooooooooo
	oooooxxxxxxxxxxxxxxxxxxxxxxxxxoooooooooo
	ooooooooooxxxxxxxxxxxxxxxxxxxxxxxxxooooo
	oooooooooooooooxxxxxxxxxxxxxxxxxxxxxxxxx
	```

	## Specifications

	```toml
	# general configurations
	[general]
	resolution = [960, 544] # optional, [W, H], default is None. This is the default resolution for all datasets
	caption_extension = ".txt" # optional, default is None. This is the default caption extension for all datasets
	batch_size = 1 # optional, default is 1. This is the default batch size for all datasets
	enable_bucket = true # optional, default is false. Enable bucketing for datasets
	bucket_no_upscale = false # optional, default is false. Disable upscaling for bucketing. Ignored if enable_bucket is false

	### Image Dataset

	# sample image dataset with caption text files
	[[datasets]]
	image_directory = "/path/to/image_dir"
	caption_extension = ".txt" # required for caption text files, if general caption extension is not set
	resolution = [960, 544] # required if general resolution is not set
	batch_size = 4 # optional, overwrite the default batch size
	enable_bucket = false # optional, overwrite the default bucketing setting
	bucket_no_upscale = true # optional, overwrite the default bucketing setting
	cache_directory = "/path/to/cache_directory" # optional, default is None to use the same directory as the image directory. NOTE: caching is always enabled

	# sample image dataset with metadata jsonl file
	[[datasets]]
	image_jsonl_file = "/path/to/metadata.jsonl" # includes pairs of image files and captions
	resolution = [960, 544] # required if general resolution is not set
	cache_directory = "/path/to/cache_directory" # required for metadata jsonl file
	# caption_extension is not required for metadata jsonl file
	# batch_size, enable_bucket, bucket_no_upscale are also available for metadata jsonl file

	### Video Dataset

	# sample video dataset with caption text files
	[[datasets]]
	video_directory = "/path/to/video_dir"
	caption_extension = ".txt" # required for caption text files, if general caption extension is not set
	resolution = [960, 544] # required if general resolution is not set

	target_frames = [1, 25, 79] # required for video dataset. list of video lengths to extract frames. each element must be N*4+1 (N=0,1,2,...)

	# NOTE: Please do not include 1 in target_frames if you are using the frame_extraction "chunk". It will make the all frames to be extracted.

	frame_extraction = "head" # optional, "head" or "chunk", "slide", "uniform". Default is "head"
	frame_stride = 1 # optional, default is 1, available for "slide" frame extraction
	frame_sample = 4 # optional, default is 1 (same as "head"), available for "uniform" frame extraction
	# batch_size, enable_bucket, bucket_no_upscale, cache_directory are also available for video dataset

	# sample video dataset with metadata jsonl file
	[[datasets]]
	video_jsonl_file = "/path/to/metadata.jsonl" # includes pairs of video files and captions

	target_frames = [1, 79]

	cache_directory = "/path/to/cache_directory" # required for metadata jsonl file
	# frame_extraction, frame_stride, frame_sample are also available for metadata jsonl file
	```

	<!--
	# sample image dataset with lance
	[[datasets]]
	image_lance_dataset = "/path/to/lance_dataset"
	resolution = [960, 544] # required if general resolution is not set
	# batch_size, enable_bucket, bucket_no_upscale, cache_directory are also available for lance dataset
	-->

	The metadata with .json file will be supported in the near future.



	<!--

	```toml
	# general configurations
	[general]
	resolution = [960, 544] # optional, [W, H], default is None. This is the default resolution for all datasets
	caption_extension = ".txt" # optional, default is None. This is the default caption extension for all datasets
	batch_size = 1 # optional, default is 1. This is the default batch size for all datasets
	enable_bucket = true # optional, default is false. Enable bucketing for datasets
	bucket_no_upscale = false # optional, default is false. Disable upscaling for bucketing. Ignored if enable_bucket is false

	# sample image dataset with caption text files
	[[datasets]]
	image_directory = "/path/to/image_dir"
	caption_extension = ".txt" # required for caption text files, if general caption extension is not set
	resolution = [960, 544] # required if general resolution is not set
	batch_size = 4 # optional, overwrite the default batch size
	enable_bucket = false # optional, overwrite the default bucketing setting
	bucket_no_upscale = true # optional, overwrite the default bucketing setting
	cache_directory = "/path/to/cache_directory" # optional, default is None to use the same directory as the image directory. NOTE: caching is always enabled

	# sample image dataset with metadata jsonl file
	[[datasets]]
	image_jsonl_file = "/path/to/metadata.jsonl" # includes pairs of image files and captions
	resolution = [960, 544] # required if general resolution is not set
	cache_directory = "/path/to/cache_directory" # required for metadata jsonl file
	# caption_extension is not required for metadata jsonl file
	# batch_size, enable_bucket, bucket_no_upscale are also available for metadata jsonl file

	# sample video dataset with caption text files
	[[datasets]]
	video_directory = "/path/to/video_dir"
	caption_extension = ".txt" # required for caption text files, if general caption extension is not set
	resolution = [960, 544] # required if general resolution is not set
	target_frames = [1, 25, 79] # required for video dataset. list of video lengths to extract frames. each element must be N*4+1 (N=0,1,2,...)
	frame_extraction = "head" # optional, "head" or "chunk", "slide", "uniform". Default is "head"
	frame_stride = 1 # optional, default is 1, available for "slide" frame extraction
	frame_sample = 4 # optional, default is 1 (same as "head"), available for "uniform" frame extraction
	# batch_size, enable_bucket, bucket_no_upscale, cache_directory are also available for video dataset

	# sample video dataset with metadata jsonl file
	[[datasets]]
	video_jsonl_file = "/path/to/metadata.jsonl" # includes pairs of video files and captions
	target_frames = [1, 79]
	cache_directory = "/path/to/cache_directory" # required for metadata jsonl file
	# frame_extraction, frame_stride, frame_sample are also available for metadata jsonl file
	```

	# sample image dataset with lance
	[[datasets]]
	image_lance_dataset = "/path/to/lance_dataset"
	resolution = [960, 544] # required if general resolution is not set
	# batch_size, enable_bucket, bucket_no_upscale, cache_directory are also available for lance dataset

	The metadata with .json file will be supported in the near future.




	-->