Spaces:
Sleeping
Sleeping
File size: 4,460 Bytes
29b9c56 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 | python profile.py --dwt --no_grad -j 3
Type Time (%) Time (ms) Calls Avg Min Max Name
GPU activities: 77.19 4.4973 30 149.91us 32.001us 330.34us void spatialDepthwiseConvolutionUpdateOutput<f...
GPU activities: 10.67 0.62183 35 17.766us 896ns 576.52us [CUDA memcpy HtoD]
GPU activities: 6.22 0.36215 30 12.071us 1.6000us 37.793us void kernelPointwiseApply2<CopyOp<float, float...
GPU activities: 3.83 0.22304 30 7.4340us 5.4400us 9.4090us void CatArrayBatchedCopy<float, unsigned int, ...
GPU activities: 1.74 0.10144 4 25.360us 5.2480us 65.729us void kernelReduceAllPass1<float, unsigned int,...
GPU activities: 0.14 0.008064 4 2.0160us 1.5040us 2.6880us void kernelReduceAllPass2<float, ReduceAdd<flo...
GPU activities: 0.08 0.004832 4 1.2080us 1.0880us 1.3440us [CUDA memcpy DtoH]
GPU activities: 0.07 0.003904 4 976ns 928ns 1.1200us void kernelPointwiseApply1<TensorDivConstantOp...
GPU activities: 0.06 0.003584 4 896ns 800ns 1.1840us void kernelPointwiseApply1<TensorFillOp<float>...
Total: 100 5.82614 145
Total (no mem): 89.25 5.19948 106
API calls: 97.54 4286.65 11 389.70ms 13.121us 4.28252s cudaMalloc
API calls: 2.09 91.873 1 91.873ms 91.873ms 91.873ms cudaDeviceSynchronize
API calls: 0.11 5.0274 185 27.174us 128ns 1.1468ms cuDeviceGetAttribute
API calls: 0.11 4.9842 2 2.4921ms 2.4869ms 2.4973ms cudaGetDeviceProperties
API calls: 0.03 1.4266 3 475.54us 14.329us 1.3919ms cudaHostAlloc
API calls: 0.03 1.2589 106 11.876us 6.3590us 35.621us cudaLaunch
API calls: 0.03 1.1159 39 28.613us 4.9750us 645.78us cudaMemcpyAsync
API calls: 0.01 0.54757 1583 345ns 270ns 5.1210us cudaGetDevice
API calls: 0.01 0.48097 2 240.48us 240.19us 240.78us cuDeviceGetName
API calls: 0.01 0.29629 2 148.14us 145.37us 150.92us cuDeviceTotalMem
API calls: 0.01 0.29215 705 414ns 328ns 2.9310us cudaSetDevice
API calls: 0 0.20274 938 216ns 173ns 9.9390us cudaSetupArgument
API calls: 0 0.12244 9 13.604us 1.8180us 90.643us cudaStreamSynchronize
API calls: 0 0.10032 51 1.9670us 639ns 3.9840us cudaEventQuery
API calls: 0 0.048559 30 1.6180us 1.2770us 7.3410us cudaEventCreateWithFlags
API calls: 0 0.040128 30 1.3370us 1.2040us 2.5630us cudaEventRecord
API calls: 0 0.034676 27 1.2840us 551ns 1.9660us cudaEventDestroy
API calls: 0 0.031193 170 183ns 110ns 465ns cudaGetLastError
API calls: 0 0.02614 106 246ns 140ns 700ns cudaConfigureCall
API calls: 0 0.003345 13 257ns 104ns 797ns cudaGetDeviceCount
API calls: 0 0.001646 4 411ns 122ns 1.1040us cuDeviceGetCount
API calls: 0 0.001009 3 336ns 140ns 637ns cuDeviceGet
API calls: 0 0.000564 1 564ns 564ns 564ns cuInit
API calls: 0 0.00055 1 550ns 550ns 550ns cuDriverGetVersion
Total: 99.98 4394.57 4022 |