1. Download Datasets and Unzip
Download WebFace42M from https://www.face-benchmark.org/download.html.
The raw data of WebFace42M will have 10 directories after being unarchived:WebFace4M contains 1 directory: 0.WebFace12M contains 3 directories: 0,1,2.WebFace42M contains 10 directories: 0,1,2,3,4,5,6,7,8,9.
2. Create Shuffled Rec File for DALI
Note: Shuffled rec is very important to DALI, and rec without shuffled can cause performance degradation, origin insightface style rec file do not support Nvidia DALI, you must follow this command mxnet.tools.im2rec to generate a shuffled rec file.
# directories and files for yours datsaets
/WebFace42M_Root
βββ 0_0_0000000
β βββ 0_0.jpg
β βββ 0_1.jpg
β βββ 0_2.jpg
β βββ 0_3.jpg
β βββ 0_4.jpg
βββ 0_0_0000001
β βββ 0_5.jpg
β βββ 0_6.jpg
β βββ 0_7.jpg
β βββ 0_8.jpg
β βββ 0_9.jpg
βββ 0_0_0000002
β βββ 0_10.jpg
β βββ 0_11.jpg
β βββ 0_12.jpg
β βββ 0_13.jpg
β βββ 0_14.jpg
β βββ 0_15.jpg
β βββ 0_16.jpg
β βββ 0_17.jpg
βββ 0_0_0000003
β βββ 0_18.jpg
β βββ 0_19.jpg
β βββ 0_20.jpg
βββ 0_0_0000004
# 1) create train.lst using follow command
python -m mxnet.tools.im2rec --list --recursive train WebFace42M_Root
# 2) create train.rec and train.idx using train.lst using following command
python -m mxnet.tools.im2rec --num-thread 16 --quality 100 train WebFace42M_Root
Finally, you will get three files: train.lst, train.rec, train.idx. which train.idx, train.rec are using for training.