# Judge Mode Walkthrough Use this sequence during technical evaluation with the current React UI and FastAPI SSE stream. ## Goal Make every claim falsifiable and tied to fields returned by the backend. ## Flow 1. Open `http://localhost:8000/index.html`. 2. Choose or paste a CUDA kernel. 3. Run ROCmPort AI and watch the five agent cards: analyzer, translator, optimizer, tester, coordinator. 4. Confirm the tester event reports speedup, bandwidth, bottleneck, and data source. 5. Confirm the coordinator event produces the final report JSON in its SSE `detail`. 6. Use `/benchmark-report` for reproducible demo-artifact metrics and data-source labels. 7. Show a limited-gain case such as `vector_add` and explain the bandwidth-bound result. ## Baseline Policy - Primary baseline: straight hipify output with minimal required compile edits. - Demo-mode baselines come from `backend/tools/demo_artifacts.py`. - Real hardware baselines require `ROCM_AVAILABLE=true` and captured `hipcc`/`rocprof` logs. - Never mix `demo_artifact` and `real_rocm` numbers in the same result table. ## Visible Artifacts In Current UI - CUDA source input. - Agent event stream. - Tester summary: execution time, bandwidth utilization, bottleneck, notes. - Final summary footer: changes made, critical bugs found, compile/migration success, data source. ## Additional Artifacts Available By API - `/benchmark-report`: reproducible benchmark summary and static risk scans. - `/export`: migration diff, original CUDA, optimized HIP, and report markdown. - `/demo-kernels`: source for bundled demo kernels. ## Pass/Fail Criteria A demo is credible if: - Every speedup is tied to its `data_source`. - The baseline definition is stated before showing speedup. - Static risk findings match the analyzer event or `/benchmark-report`. - At least one non-perfect or limited-gain case is included.