Spaces:

lablab-ai-amd-developer-hackathon
/

ROCmPort-AI

Running

App Files Files Community

ROCmPort-AI / docs /JUDGE_MODE.md

tazwarrrr

docs fix

e7a1a69 17 days ago

preview code

raw

history blame contribute delete

1.88 kB

	# Judge Mode Walkthrough

	Use this sequence during technical evaluation with the current React UI and
	FastAPI SSE stream.

	## Goal

	Make every claim falsifiable and tied to fields returned by the backend.

	## Flow

	1. Open `http://localhost:8000/index.html`.
	2. Choose or paste a CUDA kernel.
	3. Run ROCmPort AI and watch the five agent cards:
	analyzer, translator, optimizer, tester, coordinator.
	4. Confirm the tester event reports speedup, bandwidth, bottleneck, and data source.
	5. Confirm the coordinator event produces the final report JSON in its SSE `detail`.
	6. Use `/benchmark-report` for reproducible demo-artifact metrics and data-source labels.
	7. Show a limited-gain case such as `vector_add` and explain the bandwidth-bound result.

	## Baseline Policy

	- Primary baseline: straight hipify output with minimal required compile edits.
	- Demo-mode baselines come from `backend/tools/demo_artifacts.py`.
	- Real hardware baselines require `ROCM_AVAILABLE=true` and captured `hipcc`/`rocprof` logs.
	- Never mix `demo_artifact` and `real_rocm` numbers in the same result table.

	## Visible Artifacts In Current UI

	- CUDA source input.
	- Agent event stream.
	- Tester summary: execution time, bandwidth utilization, bottleneck, notes.
	- Final summary footer: changes made, critical bugs found, compile/migration success, data source.

	## Additional Artifacts Available By API

	- `/benchmark-report`: reproducible benchmark summary and static risk scans.
	- `/export`: migration diff, original CUDA, optimized HIP, and report markdown.
	- `/demo-kernels`: source for bundled demo kernels.

	## Pass/Fail Criteria

	A demo is credible if:

	- Every speedup is tied to its `data_source`.
	- The baseline definition is stated before showing speedup.
	- Static risk findings match the analyzer event or `/benchmark-report`.
	- At least one non-perfect or limited-gain case is included.