| # Judge Mode Walkthrough |
|
|
| Use this sequence during technical evaluation with the current React UI and |
| FastAPI SSE stream. |
|
|
| ## Goal |
|
|
| Make every claim falsifiable and tied to fields returned by the backend. |
|
|
| ## Flow |
|
|
| 1. Open `http://localhost:8000/index.html`. |
| 2. Choose or paste a CUDA kernel. |
| 3. Run ROCmPort AI and watch the five agent cards: |
| analyzer, translator, optimizer, tester, coordinator. |
| 4. Confirm the tester event reports speedup, bandwidth, bottleneck, and data source. |
| 5. Confirm the coordinator event produces the final report JSON in its SSE `detail`. |
| 6. Use `/benchmark-report` for reproducible demo-artifact metrics and data-source labels. |
| 7. Show a limited-gain case such as `vector_add` and explain the bandwidth-bound result. |
|
|
| ## Baseline Policy |
|
|
| - Primary baseline: straight hipify output with minimal required compile edits. |
| - Demo-mode baselines come from `backend/tools/demo_artifacts.py`. |
| - Real hardware baselines require `ROCM_AVAILABLE=true` and captured `hipcc`/`rocprof` logs. |
| - Never mix `demo_artifact` and `real_rocm` numbers in the same result table. |
|
|
| ## Visible Artifacts In Current UI |
|
|
| - CUDA source input. |
| - Agent event stream. |
| - Tester summary: execution time, bandwidth utilization, bottleneck, notes. |
| - Final summary footer: changes made, critical bugs found, compile/migration success, data source. |
|
|
| ## Additional Artifacts Available By API |
|
|
| - `/benchmark-report`: reproducible benchmark summary and static risk scans. |
| - `/export`: migration diff, original CUDA, optimized HIP, and report markdown. |
| - `/demo-kernels`: source for bundled demo kernels. |
|
|
| ## Pass/Fail Criteria |
|
|
| A demo is credible if: |
|
|
| - Every speedup is tied to its `data_source`. |
| - The baseline definition is stated before showing speedup. |
| - Static risk findings match the analyzer event or `/benchmark-report`. |
| - At least one non-perfect or limited-gain case is included. |
|
|