python profile.py -f -j 2 Type Time (%) Time (ms) Calls Avg Min Max Name GPU activities: 20.96 3.5466 24 147.77us 68.544us 243.27us void spatialDepthwiseConvolutionUpdateOutput, ... GPU activities: 4.94 0.83662 48 17.429us 7.5840us 28.129us void indexSelectSmallIndex,... GPU activities: 2.81 0.47514 24 19.797us 2.7840us 55.105us void CatArrayBatchedCopy,... GPU activities: 1.25 0.21111 6 35.184us 15.904us 54.657us void indexSelectSmallIndex,... GPU activities: 1.2 0.20346 48 4.2380us 4.0640us 4.4160us void kernelReduceAll,... GPU activities: 0.51 0.086562 80 1.0820us 800ns 1.7920us void thrust::cuda_cub::core::_kernel_agent,... GPU activities: 0.22 0.037281 6 6.2130us 2.1760us 10.337us void kernelPointwiseApply2... GPU activities: 0.04 0.006912 3 2.3040us 2.2080us 2.4960us void kernelReduceAllPass2