show / SHOW /modules /arcface_torch /docs /prepare_webface42m.md
camenduru's picture
thanks to show ❀
3bbb319
## 1. Download Datasets and Unzip
Download WebFace42M from [https://www.face-benchmark.org/download.html](https://www.face-benchmark.org/download.html).
The raw data of `WebFace42M` will have 10 directories after being unarchived:
`WebFace4M` contains 1 directory: `0`.
`WebFace12M` contains 3 directories: `0,1,2`.
`WebFace42M` contains 10 directories: `0,1,2,3,4,5,6,7,8,9`.
## 2. Create Shuffled Rec File for DALI
Note: Shuffled rec is very important to DALI, and rec without shuffled can cause performance degradation, origin insightface style rec file
do not support Nvidia DALI, you must follow this command [mxnet.tools.im2rec](https://github.com/apache/incubator-mxnet/blob/master/tools/im2rec.py) to generate a shuffled rec file.
```shell
# directories and files for yours datsaets
/WebFace42M_Root
β”œβ”€β”€ 0_0_0000000
β”‚Β Β  β”œβ”€β”€ 0_0.jpg
β”‚Β Β  β”œβ”€β”€ 0_1.jpg
β”‚Β Β  β”œβ”€β”€ 0_2.jpg
β”‚Β Β  β”œβ”€β”€ 0_3.jpg
β”‚Β Β  └── 0_4.jpg
β”œβ”€β”€ 0_0_0000001
β”‚Β Β  β”œβ”€β”€ 0_5.jpg
β”‚Β Β  β”œβ”€β”€ 0_6.jpg
β”‚Β Β  β”œβ”€β”€ 0_7.jpg
β”‚Β Β  β”œβ”€β”€ 0_8.jpg
β”‚Β Β  └── 0_9.jpg
β”œβ”€β”€ 0_0_0000002
β”‚Β Β  β”œβ”€β”€ 0_10.jpg
β”‚Β Β  β”œβ”€β”€ 0_11.jpg
β”‚Β Β  β”œβ”€β”€ 0_12.jpg
β”‚Β Β  β”œβ”€β”€ 0_13.jpg
β”‚Β Β  β”œβ”€β”€ 0_14.jpg
β”‚Β Β  β”œβ”€β”€ 0_15.jpg
β”‚Β Β  β”œβ”€β”€ 0_16.jpg
β”‚Β Β  └── 0_17.jpg
β”œβ”€β”€ 0_0_0000003
β”‚Β Β  β”œβ”€β”€ 0_18.jpg
β”‚Β Β  β”œβ”€β”€ 0_19.jpg
β”‚Β Β  └── 0_20.jpg
β”œβ”€β”€ 0_0_0000004
# 1) create train.lst using follow command
python -m mxnet.tools.im2rec --list --recursive train WebFace42M_Root
# 2) create train.rec and train.idx using train.lst using following command
python -m mxnet.tools.im2rec --num-thread 16 --quality 100 train WebFace42M_Root
```
Finally, you will get three files: `train.lst`, `train.rec`, `train.idx`. which `train.idx`, `train.rec` are using for training.