ROCmPort-AI / docs /JUDGE_MODE.md
tazwarrrr's picture
docs fix
e7a1a69
# Judge Mode Walkthrough
Use this sequence during technical evaluation with the current React UI and
FastAPI SSE stream.
## Goal
Make every claim falsifiable and tied to fields returned by the backend.
## Flow
1. Open `http://localhost:8000/index.html`.
2. Choose or paste a CUDA kernel.
3. Run ROCmPort AI and watch the five agent cards:
analyzer, translator, optimizer, tester, coordinator.
4. Confirm the tester event reports speedup, bandwidth, bottleneck, and data source.
5. Confirm the coordinator event produces the final report JSON in its SSE `detail`.
6. Use `/benchmark-report` for reproducible demo-artifact metrics and data-source labels.
7. Show a limited-gain case such as `vector_add` and explain the bandwidth-bound result.
## Baseline Policy
- Primary baseline: straight hipify output with minimal required compile edits.
- Demo-mode baselines come from `backend/tools/demo_artifacts.py`.
- Real hardware baselines require `ROCM_AVAILABLE=true` and captured `hipcc`/`rocprof` logs.
- Never mix `demo_artifact` and `real_rocm` numbers in the same result table.
## Visible Artifacts In Current UI
- CUDA source input.
- Agent event stream.
- Tester summary: execution time, bandwidth utilization, bottleneck, notes.
- Final summary footer: changes made, critical bugs found, compile/migration success, data source.
## Additional Artifacts Available By API
- `/benchmark-report`: reproducible benchmark summary and static risk scans.
- `/export`: migration diff, original CUDA, optimized HIP, and report markdown.
- `/demo-kernels`: source for bundled demo kernels.
## Pass/Fail Criteria
A demo is credible if:
- Every speedup is tied to its `data_source`.
- The baseline definition is stated before showing speedup.
- Static risk findings match the analyzer event or `/benchmark-report`.
- At least one non-perfect or limited-gain case is included.