| ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ | |
| Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls | |
| ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ | |
| void cutlass::Kernel2<cutlass_80_tensorop_bf16_s1681... 0.00% 0.000us 0.00% 0.000us 0.000us 5.882ms 95.15% 5.882ms 1.470ms 4 | |
| matmul_add 0.00% 0.000us 0.00% 0.000us 0.000us 5.034ms 81.44% 5.034ms 1.678ms 3 | |
| ProfilerStep* 1.18% 83.901us 9.02% 642.143us 214.048us 0.000us 0.00% 4.871ms 1.624ms 3 | |
| matmul_add 5.59% 397.996us 7.84% 558.242us 186.081us 0.000us 0.00% 4.871ms 1.624ms 3 | |
| aten::matmul 0.36% 25.870us 1.69% 120.100us 40.033us 0.000us 0.00% 4.641ms 1.547ms 3 | |
| aten::mm 1.05% 74.561us 1.32% 94.230us 31.410us 4.641ms 75.07% 4.641ms 1.547ms 3 | |
| void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 299.900us 4.85% 299.900us 74.975us 4 | |
| aten::add 0.34% 24.125us 0.56% 40.146us 13.382us 230.468us 3.73% 230.468us 76.823us 3 | |
| cudaDeviceGetAttribute 0.02% 1.404us 0.02% 1.404us 0.468us 0.000us 0.00% 0.000us 0.000us 3 | |
| cuLaunchKernel 0.26% 18.265us 0.26% 18.265us 6.088us 0.000us 0.00% 0.000us 0.000us 3 | |
| cudaLaunchKernel 0.22% 16.021us 0.22% 16.021us 5.340us 0.000us 0.00% 0.000us 0.000us 3 | |
| cudaDeviceSynchronize 90.98% 6.480ms 90.98% 6.480ms 6.480ms 0.000us 0.00% 0.000us 0.000us 1 | |
| ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ | |
| Self CPU time total: 7.122ms | |
| Self CUDA time total: 6.182ms | |
Xet Storage Details
- Size:
- 3.23 kB
- Xet hash:
- 222851c64ae54d29606f9e279eb079c738e7cd5ab41797300c2479274d3467d0
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.