OpenAI-style MoE (Mixture of Experts) Benchmarks - Aggregated Results

This document combines benchmark results from multiple OpenAI-style MoE implementations.

Combined Summary and Visualization

2025-12-19T23:02:40.893386 image/svg+xml Matplotlib v3.10.8, https://matplotlib.org/ cuda_B1_S512_E2 cuda_B1_S512_E4 cuda_B1_S1024_E2 cuda_B1_S1024_E4 cuda_B4_S512_E2 cuda_B4_S512_E4 cuda_B4_S1024_E2 cuda_B4_S1024_E4 Workload 0 200 400 600 800 1000 1200 1400 1600 Latency P50 (ms) Attention Implementation Latency binned_torch gpt_oss_experts
▶ code ▼ output ▶ uv-logs | Cell: combine | 4.45s | Raw
======================================================================
LOADING BENCHMARK DATA
======================================================================
✓ Binned PyTorch                : /__w/kernels-benchmarks/kernels-benchmarks/benches/openai_moe/impls/.uvnote/cache/fd01907ce582015b5dd52e56081cc8e2a21813f73271b422308d60a8ab9391af
✓ GptOssExperts                 : /__w/kernels-benchmarks/kernels-benchmarks/benches/openai_moe/impls/.uvnote/cache/002e3e7d42f2dbf6d5e5216db57e56aa649bc6ac59ce4131ce80c5849e52482b

  ✓ Found Binned PyTorch
     Path: /__w/kernels-benchmarks/kernels-benchmarks/benches/openai_moe/impls/.uvnote/cache/fd01907ce582015b5dd52e56081cc8e2a21813f73271b422308d60a8ab9391af/openai_moe.jsonl
  ✓ Found GptOssExperts
     Path: /__w/kernels-benchmarks/kernels-benchmarks/benches/openai_moe/impls/.uvnote/cache/002e3e7d42f2dbf6d5e5216db57e56aa649bc6ac59ce4131ce80c5849e52482b/openai_moe.jsonl

======================================================================
Summary: 2 found, 0 skipped, 0 missing
======================================================================

COMBINED BENCHMARK SUMMARY

impl                     wl                  p50(ms)  ok
binned_torch             cuda_B1_S1024_E2     367.98  True
binned_torch             cuda_B1_S1024_E4     396.30  True
binned_torch             cuda_B1_S512_E2      154.35  True
binned_torch             cuda_B1_S512_E4      195.55  True
binned_torch             cuda_B4_S1024_E2    1510.09  True
binned_torch             cuda_B4_S1024_E4    1618.05  True
binned_torch             cuda_B4_S512_E2      733.47  True
binned_torch             cuda_B4_S512_E4      787.61  True
gpt_oss_experts          cuda_B1_S1024_E2       3.87  True
gpt_oss_experts          cuda_B1_S1024_E4       5.34  True
gpt_oss_experts          cuda_B1_S512_E2        2.66  True
gpt_oss_experts          cuda_B1_S512_E4        3.95  True
gpt_oss_experts          cuda_B4_S1024_E2      13.39  True
gpt_oss_experts          cuda_B4_S1024_E4      13.41  True
gpt_oss_experts          cuda_B4_S512_E2        6.80  True
gpt_oss_experts          cuda_B4_S512_E4        7.53  True

GENERATING COMBINED VISUALIZATION

Loaded 16 records
✓ Visualization saved as latency.svg
Saved latency.png
✓ Visualization saved as latency.svg
✓ SVG visualization ready!

ANALYSIS COMPLETE
Total implementations analyzed: 2

Implementations included:
  ✓ Binned PyTorch
  ✓ GptOssExperts
▶ UV Install Logs

Artifacts:

latency.svg
2025-12-19T23:02:40.893386 image/svg+xml Matplotlib v3.10.8, https://matplotlib.org/ cuda_B1_S512_E2 cuda_B1_S512_E4 cuda_B1_S1024_E2 cuda_B1_S1024_E4 cuda_B4_S512_E2 cuda_B4_S512_E4 cuda_B4_S1024_E2 cuda_B4_S1024_E4 Workload 0 200 400 600 800 1000 1200 1400 1600 Latency P50 (ms) Attention Implementation Latency binned_torch gpt_oss_experts