Spaces:
Sleeping
Sleeping
| python profile.py -f -j 1 | |
| Type Time (%) Time (ms) Calls Avg Min Max Name | |
| GPU activities: 25.49 11.801 60 196.68us 158.82us 243.43us void spatialDepthwiseConvolutionUpdateOutput<f... | |
| GPU activities: 13.21 6.117 255 23.988us 992ns 2.6960ms | |
| GPU activities: 12.44 5.7582 60 95.970us 94.145us 98.721us void kernelPointwiseApply2<TensorTakeOp<float,... | |
| GPU activities: 8.65 4.0074 100 40.073us 1.7280us 99.490us void kernelPointwiseApply3<TensorAddOp<long>, ... | |
| GPU activities: 6.95 3.2163 120 26.802us 25.440us 28.544us void indexSelectSmallIndex<float, unsigned int... | |
| GPU activities: 6.71 3.1092 90 34.546us 20.096us 57.025us void kernelPointwiseApply2<CopyOp<float, float... | |
| GPU activities: 5.8 2.6862 140 19.186us 864ns 583.46us | |
| GPU activities: 3.66 1.6945 60 28.241us 15.296us 66.369us void kernelPointwiseApply3<TensorAddOp<float>,... | |
| GPU activities: 3.11 1.4384 30 47.946us 43.745us 49.376us void kernelPointwiseApply2<TensorDivConstantOp... | |
| GPU activities: 3.04 1.4067 90 15.630us 11.328us 27.552us void kernelPointwiseApply2<CopyOp<float, float... | |
| GPU activities: 2.08 0.96206 30 32.068us 28.800us 35.840us void kernelPointwiseApply3<TensorAddOp<float>,... | |
| GPU activities: 1.93 0.89537 30 29.845us 28.608us 31.297us void kernelPointwiseApply3<TensorSubOp<float>,... | |
| GPU activities: 1.76 0.81451 15 54.300us 53.377us 55.361us void indexSelectSmallIndex<float, unsigned int... | |
| GPU activities: 1.7 0.78772 10 78.772us 67.008us 90.689us void kernelReduceAllPass1<float, unsigned int,... | |
| GPU activities: 0.52 0.23917 60 3.9860us 3.8410us 4.1920us void kernelReduceAll<long, unsigned int, long,... | |
| GPU activities: 0.51 0.23808 15 15.872us 15.616us 16.128us void kernelPointwiseApply3<TensorSubOp<float>,... | |
| GPU activities: 0.51 0.23428 60 3.9040us 3.8080us 4.1280us void kernelReduceAll<long, unsigned int, long,... | |
| GPU activities: 0.34 0.15543 15 10.361us 10.080us 10.625us void kernelPointwiseApply2<Tensor_neg_Float_Op... | |
| GPU activities: 0.32 0.14679 120 1.2230us 928ns 1.5050us void kernelPointwiseApply2<TensorMulConstantOp... | |
| GPU activities: 0.27 0.1241 100 1.2400us 800ns 1.9840us void thrust::cuda_cub::core::_kernel_agent<thr... | |
| GPU activities: 0.26 0.11875 120 989ns 768ns 1.2800us void kernelPointwiseApply1<TensorFillOp<long>,... | |
| GPU activities: 0.25 0.11347 15 7.5640us 832ns 20.865us void kernelPointwiseApply1<TensorFillOp<float>... | |
| GPU activities: 0.21 0.09632 60 1.6050us 1.5680us 1.6960us void kernelPointwiseApply2<TensorRemainderOp<l... | |
| GPU activities: 0.2 0.090593 60 1.5090us 1.3440us 2.4330us void kernelPointwiseApply2<CopyOp<float, float... | |
| GPU activities: 0.06 0.026977 10 2.6970us 2.2080us 3.2330us void kernelReduceAllPass2<float, ReduceAdd<flo... | |
| GPU activities: 0.03 0.013633 15 908ns 832ns 1.0880us | |
| GPU activities: 0.03 0.011872 10 1.1870us 960ns 1.5040us void kernelPointwiseApply1<TensorDivConstantOp... | |
| Total: 100.04 46.304 1750 | |
| Total (no mem): 81.03 37.5008 1355 | |
| API calls: 95.36 4330.04 13 333.08ms 16.022us 4.32040s cudaMalloc | |
| API calls: 3 136.03 1 136.03ms 136.03ms 136.03ms cudaDeviceSynchronize | |
| API calls: 0.51 23.082 1340 17.225us 5.7790us 136.37us cudaLaunch | |
| API calls: 0.45 20.316 395 51.431us 5.1510us 4.4311ms cudaMemcpyAsync | |
| API calls: 0.18 8.1313 15642 519ns 254ns 18.543us cudaGetDevice | |
| API calls: 0.11 5.0693 185 27.401us 126ns 1.1465ms cuDeviceGetAttribute | |
| API calls: 0.11 5.0058 2 2.5029ms 2.4990ms 2.5068ms cudaGetDeviceProperties | |
| API calls: 0.11 4.9763 395 12.598us 1.6840us 223.54us cudaStreamSynchronize | |
| API calls: 0.07 3.0304 4947 612ns 279ns 15.640us cudaSetDevice | |
| API calls: 0.06 2.5076 6885 364ns 109ns 705.44us cudaSetupArgument | |
| API calls: 0.01 0.5482 100 5.4820us 2.8720us 15.523us cudaFuncGetAttributes | |
| API calls: 0.01 0.48326 2 241.63us 240.94us 242.32us cuDeviceGetName | |
| API calls: 0.01 0.45813 1340 341ns 146ns 2.5300us cudaConfigureCall | |
| API calls: 0.01 0.39718 1290 307ns 120ns 2.0540us cudaGetLastError | |
| API calls: 0.01 0.39126 15 26.083us 3.9020us 133.31us cudaMemsetAsync | |
| API calls: 0.01 0.29644 2 148.22us 145.40us 151.04us cuDeviceTotalMem | |
| API calls: 0 0.091542 100 915ns 419ns 1.8570us cudaDeviceGetAttribute | |
| API calls: 0 0.053599 200 267ns 105ns 581ns cudaPeekAtLastError | |
| API calls: 0 0.004603 14 328ns 102ns 1.3770us cudaGetDeviceCount | |
| API calls: 0 0.001982 4 495ns 140ns 1.3530us cuDeviceGetCount | |
| API calls: 0 0.001158 3 386ns 129ns 841ns cuDeviceGet | |
| API calls: 0 0.000678 1 678ns 678ns 678ns cuInit | |
| API calls: 0 0.00047 1 470ns 470ns 470ns cuDriverGetVersion | |
| Total: 100.02 4540.92 32877 |