JavaZero/traces / 01_matmul_add /64_bf16_cold_eager.txt
JavaZero's picture
download
raw
3.23 kB
------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls
------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
matmul_add 0.00% 0.000us 0.00% 0.000us 0.000us 15.307us 148.87% 15.307us 5.102us 3
ProfilerStep* 7.89% 50.265us 78.10% 497.426us 165.809us 0.000us 0.00% 7.367us 2.456us 3
matmul_add 55.01% 350.366us 70.21% 447.161us 149.054us 0.000us 0.00% 7.367us 2.456us 3
void cutlass::Kernel2<cutlass_80_wmma_tensorop_bf16_... 0.00% 0.000us 0.00% 0.000us 0.000us 6.726us 65.42% 6.726us 1.681us 4
aten::matmul 0.97% 6.193us 11.63% 74.072us 24.691us 0.000us 0.00% 4.708us 1.569us 3
aten::mm 7.96% 50.716us 10.66% 67.879us 22.626us 4.708us 45.79% 4.708us 1.569us 3
void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 3.556us 34.58% 3.556us 0.889us 4
aten::add 2.34% 14.927us 3.57% 22.723us 7.574us 2.659us 25.86% 2.659us 0.886us 3
cudaDeviceGetAttribute 0.15% 0.982us 0.15% 0.982us 0.327us 0.000us 0.00% 0.000us 0.000us 3
cuLaunchKernel 2.54% 16.181us 2.54% 16.181us 5.394us 0.000us 0.00% 0.000us 0.000us 3
cudaLaunchKernel 1.22% 7.796us 1.22% 7.796us 2.599us 0.000us 0.00% 0.000us 0.000us 3
cudaDeviceSynchronize 21.90% 139.467us 21.90% 139.467us 139.467us 0.000us 0.00% 0.000us 0.000us 1
------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
Self CPU time total: 636.893us
Self CUDA time total: 10.282us

Xet Storage Details

Size:
3.23 kB
·
Xet hash:
602b49220e32d3a9f708c67efc9bab335377dedc5f13476550eaf9ad2bec2cae

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.