JavaZero/traces / 01_matmul_add /4096_bf16_cold_eager.txt
JavaZero's picture
download
raw
3.23 kB
------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls
------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
void cutlass::Kernel2<cutlass_80_tensorop_bf16_s1681... 0.00% 0.000us 0.00% 0.000us 0.000us 5.882ms 95.15% 5.882ms 1.470ms 4
matmul_add 0.00% 0.000us 0.00% 0.000us 0.000us 5.034ms 81.44% 5.034ms 1.678ms 3
ProfilerStep* 1.18% 83.901us 9.02% 642.143us 214.048us 0.000us 0.00% 4.871ms 1.624ms 3
matmul_add 5.59% 397.996us 7.84% 558.242us 186.081us 0.000us 0.00% 4.871ms 1.624ms 3
aten::matmul 0.36% 25.870us 1.69% 120.100us 40.033us 0.000us 0.00% 4.641ms 1.547ms 3
aten::mm 1.05% 74.561us 1.32% 94.230us 31.410us 4.641ms 75.07% 4.641ms 1.547ms 3
void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 299.900us 4.85% 299.900us 74.975us 4
aten::add 0.34% 24.125us 0.56% 40.146us 13.382us 230.468us 3.73% 230.468us 76.823us 3
cudaDeviceGetAttribute 0.02% 1.404us 0.02% 1.404us 0.468us 0.000us 0.00% 0.000us 0.000us 3
cuLaunchKernel 0.26% 18.265us 0.26% 18.265us 6.088us 0.000us 0.00% 0.000us 0.000us 3
cudaLaunchKernel 0.22% 16.021us 0.22% 16.021us 5.340us 0.000us 0.00% 0.000us 0.000us 3
cudaDeviceSynchronize 90.98% 6.480ms 90.98% 6.480ms 6.480ms 0.000us 0.00% 0.000us 0.000us 1
------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
Self CPU time total: 7.122ms
Self CUDA time total: 6.182ms

Xet Storage Details

Size:
3.23 kB
·
Xet hash:
222851c64ae54d29606f9e279eb079c738e7cd5ab41797300c2479274d3467d0

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.