Spaces:
Sleeping
Sleeping
| python profile.py -f -j 1 | |
| Type Time (%) Time (ms) Calls Avg Min Max Name | |
| GPU activities: 24.41 2.6361 51 51.688us 992ns 2.5829ms | |
| GPU activities: 21.86 2.3607 12 196.73us 159.14us 242.76us void spatialDepthwiseConvolutionUpdateOutput<f... | |
| GPU activities: 10.65 1.1503 12 95.859us 94.273us 97.985us void kernelPointwiseApply2<TensorTakeOp<float,... | |
| GPU activities: 7.01 0.75732 20 37.866us 1.6960us 98.817us void kernelPointwiseApply3<TensorAddOp<long>, ... | |
| GPU activities: 6.6 0.71246 28 25.444us 896ns 686.67us | |
| GPU activities: 5.92 0.63933 24 26.638us 25.536us 28.513us void indexSelectSmallIndex<float, unsigned int... | |
| GPU activities: 5.74 0.6202 18 34.455us 20.032us 56.449us void kernelPointwiseApply2<CopyOp<float, float... | |
| GPU activities: 3.13 0.33792 12 28.160us 15.264us 66.112us void kernelPointwiseApply3<TensorAddOp<float>,... | |
| GPU activities: 2.66 0.28752 6 47.920us 44.064us 48.961us void kernelPointwiseApply2<TensorDivConstantOp... | |
| GPU activities: 2.6 0.28106 18 15.614us 11.424us 27.105us void kernelPointwiseApply2<CopyOp<float, float... | |
| GPU activities: 1.78 0.19191 6 31.984us 28.800us 35.137us void kernelPointwiseApply3<TensorAddOp<float>,... | |
| GPU activities: 1.66 0.17917 6 29.861us 28.641us 31.137us void kernelPointwiseApply3<TensorSubOp<float>,... | |
| GPU activities: 1.52 0.16375 3 54.581us 53.984us 55.360us void indexSelectSmallIndex<float, unsigned int... | |
| GPU activities: 1.47 0.15872 2 79.361us 68.353us 90.369us void kernelReduceAllPass1<float, unsigned int,... | |
| GPU activities: 0.47 0.051264 12 4.2720us 4.1280us 4.3840us void kernelReduceAll<long, unsigned int, long,... | |
| GPU activities: 0.44 0.047393 3 15.797us 15.681us 15.872us void kernelPointwiseApply3<TensorSubOp<float>,... | |
| GPU activities: 0.43 0.046304 12 3.8580us 3.8080us 3.9680us void kernelReduceAll<long, unsigned int, long,... | |
| GPU activities: 0.29 0.03104 3 10.346us 10.272us 10.432us void kernelPointwiseApply2<Tensor_neg_Float_Op... | |
| GPU activities: 0.28 0.029984 24 1.2490us 960ns 1.5040us void kernelPointwiseApply2<TensorMulConstantOp... | |
| GPU activities: 0.22 0.023744 20 1.1870us 832ns 2.0160us void thrust::cuda_cub::core::_kernel_agent<thr... | |
| GPU activities: 0.21 0.023072 24 961ns 768ns 1.1520us void kernelPointwiseApply1<TensorFillOp<long>,... | |
| GPU activities: 0.21 0.022561 3 7.5200us 800ns 20.609us void kernelPointwiseApply1<TensorFillOp<float>... | |
| GPU activities: 0.18 0.019488 12 1.6240us 1.5680us 1.7920us void kernelPointwiseApply2<TensorRemainderOp<l... | |
| GPU activities: 0.17 0.018048 12 1.5040us 1.4720us 1.6000us void kernelPointwiseApply2<CopyOp<float, float... | |
| GPU activities: 0.05 0.005376 2 2.6880us 2.1760us 3.2000us void kernelReduceAllPass2<float, ReduceAdd<flo... | |
| GPU activities: 0.02 0.002688 3 896ns 864ns 928ns | |
| GPU activities: 0.02 0.002304 2 1.1520us 960ns 1.3440us void kernelPointwiseApply1<TensorDivConstantOp... | |
| Total: 100 10.7997 350 | |
| Total (no mem): 68.99 7.45117 271 | |
| API calls: 97.25 4322.41 12 360.20ms 15.643us 4.31414s cudaMalloc | |
| API calls: 2.12 94.343 1 94.343ms 94.343ms 94.343ms cudaDeviceSynchronize | |
| API calls: 0.15 6.5229 79 82.568us 4.6270us 4.2232ms cudaMemcpyAsync | |
| API calls: 0.13 5.8286 185 31.506us 206ns 1.3966ms cuDeviceGetAttribute | |
| API calls: 0.13 5.7194 2 2.8597ms 2.6896ms 3.0298ms cudaGetDeviceProperties | |
| API calls: 0.1 4.346 268 16.216us 5.7400us 106.02us cudaLaunch | |
| API calls: 0.04 1.7618 3130 562ns 255ns 35.853us cudaGetDevice | |
| API calls: 0.02 1.098 79 13.898us 1.6890us 209.45us cudaStreamSynchronize | |
| API calls: 0.02 0.82276 2 411.38us 217.99us 604.77us cuDeviceTotalMem | |
| API calls: 0.01 0.6179 991 623ns 279ns 6.7660us cudaSetDevice | |
| API calls: 0.01 0.54713 2 273.57us 260.93us 286.20us cuDeviceGetName | |
| API calls: 0.01 0.37292 1377 270ns 110ns 1.8720us cudaSetupArgument | |
| API calls: 0 0.11578 20 5.7890us 2.8300us 13.631us cudaFuncGetAttributes | |
| API calls: 0 0.092499 268 345ns 148ns 1.1260us cudaConfigureCall | |
| API calls: 0 0.089808 258 348ns 122ns 6.9550us cudaGetLastError | |
| API calls: 0 0.079046 3 26.348us 5.0360us 64.092us cudaMemsetAsync | |
| API calls: 0 0.018829 20 941ns 449ns 1.5140us cudaDeviceGetAttribute | |
| API calls: 0 0.010992 40 274ns 105ns 635ns cudaPeekAtLastError | |
| API calls: 0 0.00889 14 635ns 253ns 1.5200us cudaGetDeviceCount | |
| API calls: 0 0.002493 4 623ns 202ns 1.6440us cuDeviceGetCount | |
| API calls: 0 0.001507 3 502ns 216ns 948ns cuDeviceGet | |
| API calls: 0 0.001017 1 1.0170us 1.0170us 1.0170us cuInit | |
| API calls: 0 0.000657 1 657ns 657ns 657ns cuDriverGetVersion | |
| Total: 99.99 4444.81 6760 |