python profile.py -f -j 1 Type Time (%) Time (ms) Calls Avg Min Max Name GPU activities: 25.49 11.801 60 196.68us 158.82us 243.43us void spatialDepthwiseConvolutionUpdateOutput, ... GPU activities: 6.95 3.2163 120 26.802us 25.440us 28.544us void indexSelectSmallIndex,... GPU activities: 3.11 1.4384 30 47.946us 43.745us 49.376us void kernelPointwiseApply2,... GPU activities: 1.93 0.89537 30 29.845us 28.608us 31.297us void kernelPointwiseApply3,... GPU activities: 1.76 0.81451 15 54.300us 53.377us 55.361us void indexSelectSmallIndex,... GPU activities: 0.51 0.23428 60 3.9040us 3.8080us 4.1280us void kernelReduceAll,... GPU activities: 0.25 0.11347 15 7.5640us 832ns 20.865us void kernelPointwiseApply1... GPU activities: 0.21 0.09632 60 1.6050us 1.5680us 1.6960us void kernelPointwiseApply2