ROCmPort-AI / docs /JUDGE_MODE.md
tazwarrrr's picture
docs fix
e7a1a69

Judge Mode Walkthrough

Use this sequence during technical evaluation with the current React UI and FastAPI SSE stream.

Goal

Make every claim falsifiable and tied to fields returned by the backend.

Flow

  1. Open http://localhost:8000/index.html.
  2. Choose or paste a CUDA kernel.
  3. Run ROCmPort AI and watch the five agent cards: analyzer, translator, optimizer, tester, coordinator.
  4. Confirm the tester event reports speedup, bandwidth, bottleneck, and data source.
  5. Confirm the coordinator event produces the final report JSON in its SSE detail.
  6. Use /benchmark-report for reproducible demo-artifact metrics and data-source labels.
  7. Show a limited-gain case such as vector_add and explain the bandwidth-bound result.

Baseline Policy

  • Primary baseline: straight hipify output with minimal required compile edits.
  • Demo-mode baselines come from backend/tools/demo_artifacts.py.
  • Real hardware baselines require ROCM_AVAILABLE=true and captured hipcc/rocprof logs.
  • Never mix demo_artifact and real_rocm numbers in the same result table.

Visible Artifacts In Current UI

  • CUDA source input.
  • Agent event stream.
  • Tester summary: execution time, bandwidth utilization, bottleneck, notes.
  • Final summary footer: changes made, critical bugs found, compile/migration success, data source.

Additional Artifacts Available By API

  • /benchmark-report: reproducible benchmark summary and static risk scans.
  • /export: migration diff, original CUDA, optimized HIP, and report markdown.
  • /demo-kernels: source for bundled demo kernels.

Pass/Fail Criteria

A demo is credible if:

  • Every speedup is tied to its data_source.
  • The baseline definition is stated before showing speedup.
  • Static risk findings match the analyzer event or /benchmark-report.
  • At least one non-perfect or limited-gain case is included.