+

LayerNorm Benchmarks - Aggregated Results

+

This document combines benchmark results from multiple LayerNorm implementations.

+

Combined Summary and Visualization

+
+ + + + + + + 2025-10-29T00:37:29.280510 + image/svg+xml + + + Matplotlib v3.10.7, https://matplotlib.org/ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + LN_B16_S2048_D4096 + + + + + + + + + + + + + LN_B16_S2048_D8192 + + + + + + + + + + + + + LN_B16_S4096_D4096 + + + + + + + + + + + + + LN_B16_S4096_D8192 + + + + Workload + + + + + + + + + + + + + + + + + 1.0 + + + + + + + + + + + + + 1.5 + + + + + + + + + + + + + 2.0 + + + + + + + + + + + + + 2.5 + + + + + + + + + + + + + 3.0 + + + + Latency P50 (ms) + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Attention Implementation Latency + + + + + + + + + + + + + torch_layer_norm + + + + + + + + + hf_kernels_layer_norm + + + + + + + + + + +
+ +
+
+ +▶ code +▼ output + ▶ uv-logs + | +Cell: combine | 4.26s + | + +Raw +
+ +
+
======================================================================
+LOADING BENCHMARK DATA
+======================================================================
+✓ PyTorch LayerNorm             : /__w/kernels-benchmarks/kernels-benchmarks/benches/layer_norm/impls/.uvnote/cache/4403c31e9bef6e648597b4fcc9cfdc402678aaa4f90636b74325f12d334214a3
+✓ HF Kernels LayerNorm          : /__w/kernels-benchmarks/kernels-benchmarks/benches/layer_norm/impls/.uvnote/cache/bd278151199f29b397d85857b87922edaa39a62623fb28e0465de47d6a3bac74
+
+  ✓ Found PyTorch LayerNorm
+     Path: /__w/kernels-benchmarks/kernels-benchmarks/benches/layer_norm/impls/.uvnote/cache/4403c31e9bef6e648597b4fcc9cfdc402678aaa4f90636b74325f12d334214a3/layer_norm.jsonl
+  ✓ Found HF Kernels LayerNorm
+     Path: /__w/kernels-benchmarks/kernels-benchmarks/benches/layer_norm/impls/.uvnote/cache/bd278151199f29b397d85857b87922edaa39a62623fb28e0465de47d6a3bac74/layer_norm.jsonl
+
+======================================================================
+Summary: 2 found, 0 skipped, 0 missing
+======================================================================
+
+COMBINED BENCHMARK SUMMARY
+
+impl                     wl                  p50(ms)  ok
+hf_kernels_layer_norm    LN_B16_S2048_D4096     0.83  True
+hf_kernels_layer_norm    LN_B16_S2048_D8192     1.66  True
+hf_kernels_layer_norm    LN_B16_S4096_D4096     1.65  True
+hf_kernels_layer_norm    LN_B16_S4096_D8192     3.26  True
+torch_layer_norm         LN_B16_S2048_D4096     0.82  True
+torch_layer_norm         LN_B16_S2048_D8192     1.68  True
+torch_layer_norm         LN_B16_S4096_D4096     1.61  True
+torch_layer_norm         LN_B16_S4096_D8192     3.32  True
+
+GENERATING COMBINED VISUALIZATION
+
+Loaded 8 records
+✓ Visualization saved as latency.svg
+Saved latency.png
+✓ Visualization saved as latency.svg
+✓ SVG visualization ready!
+
+ANALYSIS COMPLETE
+Total implementations analyzed: 2
+
+Implementations included:
+  ✓ PyTorch LayerNorm
+  ✓ HF Kernels LayerNorm
+
+
+
▶ UV Install Logs
+ +
+
+

Artifacts:

+latency.svg +
+ + + + + + + 2025-10-29T00:37:29.280510 + image/svg+xml + + + Matplotlib v3.10.7, https://matplotlib.org/ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + LN_B16_S2048_D4096 + + + + + + + + + + + + + LN_B16_S2048_D8192 + + + + + + + + + + + + + LN_B16_S4096_D4096 + + + + + + + + + + + + + LN_B16_S4096_D8192 + + + + Workload + + + + + + + + + + + + + + + + + 1.0 + + + + + + + + + + + + + 1.5 + + + + + + + + + + + + + 2.0 + + + + + + + + + + + + + 2.5 + + + + + + + + + + + + + 3.0 + + + + Latency P50 (ms) + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Attention Implementation Latency + + + + + + + + + + + + + torch_layer_norm + + + + + + + + + hf_kernels_layer_norm + + + + + + + + + + +
+
+
+
+