Spaces:
Sleeping
Sleeping
File size: 2,491 Bytes
29b9c56 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | ==1256== NVPROF is profiling process 1256, command: python profile.py --reference
==1256== Profiling application: python profile.py --reference
==1256== Profiling result:
Type Time(%) Time Calls Avg Min Max Name
GPU activities: 77.06% 2.5370ms 1 2.5370ms 2.5370ms 2.5370ms void spatialDepthwiseConvolutionUpdateOutput<float, float, unsigned int, int=0>(THCDeviceTensor<float, int=4, int, DefaultPtrTraits>, THCDeviceTensor<float, int=4, int, DefaultPtrTraits>, THCDeviceTensor<float, int=4, int, DefaultPtrTraits>, THCDeviceTensor<float, int=1, int, DefaultPtrTraits>, bool, unsigned int, int, int, int, int, int, int, int, int, int, int, int, int, int, int)
22.94% 755.16us 2 377.58us 1.3120us 753.85us [CUDA memcpy HtoD]
API calls: 99.79% 5.51912s 3 1.83971s 16.654us 5.51815s cudaMalloc
0.09% 5.2473ms 185 28.363us 212ns 1.6248ms cuDeviceGetAttribute
0.08% 4.6585ms 2 2.3293ms 2.3182ms 2.3404ms cudaGetDeviceProperties
0.02% 844.48us 2 422.24us 11.044us 833.43us cudaMemcpyAsync
0.01% 469.29us 2 234.64us 226.95us 242.34us cuDeviceGetName
0.01% 443.22us 2 221.61us 219.13us 224.09us cuDeviceTotalMem
0.00% 84.783us 2 42.391us 6.2130us 78.570us cudaStreamSynchronize
0.00% 36.868us 1 36.868us 36.868us 36.868us cudaLaunch
0.00% 27.733us 25 1.1090us 295ns 9.7890us cudaGetDevice
0.00% 9.0390us 7 1.2910us 396ns 4.9040us cudaSetDevice
0.00% 4.9140us 13 378ns 165ns 1.2110us cudaGetDeviceCount
0.00% 4.0510us 20 202ns 147ns 746ns cudaSetupArgument
0.00% 2.5770us 4 644ns 217ns 1.7200us cuDeviceGetCount
0.00% 1.6750us 1 1.6750us 1.6750us 1.6750us cudaConfigureCall
0.00% 1.6450us 3 548ns 261ns 1.0070us cuDeviceGet
0.00% 1.0640us 1 1.0640us 1.0640us 1.0640us cuInit
0.00% 846ns 1 846ns 846ns 846ns cuDriverGetVersion
0.00% 307ns 1 307ns 307ns 307ns cudaGetLastError
|