Spaces:

lablab-ai-amd-developer-hackathon
/

ROCmPort-AI

Running

App Files Files Community

ROCmPort-AI / docs /JUDGE_MODE.md

tazwarrrr's picture

docs fix

e7a1a69 17 days ago

|

history blame contribute delete

1.88 kB

Judge Mode Walkthrough

Use this sequence during technical evaluation with the current React UI and FastAPI SSE stream.

Goal

Make every claim falsifiable and tied to fields returned by the backend.

Flow

Open http://localhost:8000/index.html.
Choose or paste a CUDA kernel.
Run ROCmPort AI and watch the five agent cards: analyzer, translator, optimizer, tester, coordinator.
Confirm the tester event reports speedup, bandwidth, bottleneck, and data source.
Confirm the coordinator event produces the final report JSON in its SSE detail.
Use /benchmark-report for reproducible demo-artifact metrics and data-source labels.
Show a limited-gain case such as vector_add and explain the bandwidth-bound result.

Baseline Policy

Primary baseline: straight hipify output with minimal required compile edits.
Demo-mode baselines come from backend/tools/demo_artifacts.py.
Real hardware baselines require ROCM_AVAILABLE=true and captured hipcc/rocprof logs.
Never mix demo_artifact and real_rocm numbers in the same result table.

Visible Artifacts In Current UI

CUDA source input.
Agent event stream.
Tester summary: execution time, bandwidth utilization, bottleneck, notes.
Final summary footer: changes made, critical bugs found, compile/migration success, data source.

Additional Artifacts Available By API

/benchmark-report: reproducible benchmark summary and static risk scans.
/export: migration diff, original CUDA, optimized HIP, and report markdown.
/demo-kernels: source for bundled demo kernels.

Pass/Fail Criteria

A demo is credible if:

Every speedup is tied to its data_source.
The baseline definition is stated before showing speedup.
Static risk findings match the analyzer event or /benchmark-report.
At least one non-perfect or limited-gain case is included.