| ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ | |
| Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls | |
| ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ | |
| matmul_add 0.00% 0.000us 0.00% 0.000us 0.000us 15.307us 148.87% 15.307us 5.102us 3 | |
| ProfilerStep* 7.89% 50.265us 78.10% 497.426us 165.809us 0.000us 0.00% 7.367us 2.456us 3 | |
| matmul_add 55.01% 350.366us 70.21% 447.161us 149.054us 0.000us 0.00% 7.367us 2.456us 3 | |
| void cutlass::Kernel2<cutlass_80_wmma_tensorop_bf16_... 0.00% 0.000us 0.00% 0.000us 0.000us 6.726us 65.42% 6.726us 1.681us 4 | |
| aten::matmul 0.97% 6.193us 11.63% 74.072us 24.691us 0.000us 0.00% 4.708us 1.569us 3 | |
| aten::mm 7.96% 50.716us 10.66% 67.879us 22.626us 4.708us 45.79% 4.708us 1.569us 3 | |
| void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 3.556us 34.58% 3.556us 0.889us 4 | |
| aten::add 2.34% 14.927us 3.57% 22.723us 7.574us 2.659us 25.86% 2.659us 0.886us 3 | |
| cudaDeviceGetAttribute 0.15% 0.982us 0.15% 0.982us 0.327us 0.000us 0.00% 0.000us 0.000us 3 | |
| cuLaunchKernel 2.54% 16.181us 2.54% 16.181us 5.394us 0.000us 0.00% 0.000us 0.000us 3 | |
| cudaLaunchKernel 1.22% 7.796us 1.22% 7.796us 2.599us 0.000us 0.00% 0.000us 0.000us 3 | |
| cudaDeviceSynchronize 21.90% 139.467us 21.90% 139.467us 139.467us 0.000us 0.00% 0.000us 0.000us 1 | |
| ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ | |
| Self CPU time total: 636.893us | |
| Self CUDA time total: 10.282us | |
Xet Storage Details
- Size:
- 3.23 kB
- Xet hash:
- 602b49220e32d3a9f708c67efc9bab335377dedc5f13476550eaf9ad2bec2cae
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.