| |
| |
| <!doctype html> |
| <html lang="en"> |
| <head> |
| <meta charset="utf-8" /> |
| <meta name="viewport" content="width=device-width, initial-scale=1" /> |
| <title>AEC-Bench</title> |
| <link rel="stylesheet" href="style.css" /> |
| </head> |
| <body> |
| <main> |
| <h1>AEC-Bench</h1> |
| <p> |
| <a href="https://github.com/TheodoreGalanos/aec-bench" target="_blank" rel="noreferrer">AEC-Bench</a> |
| is an open benchmark and Python toolkit for evaluating agentic AI systems on realistic |
| Architecture, Engineering, and Construction tasks. |
| </p> |
| <p> |
| The project combines generated engineering tasks, executable verifiers, model rollout ledgers, |
| and trace artifacts so evaluation can be inspected beyond a single leaderboard score: by task |
| family, difficulty, information visibility, tool use, cost, and failure mode. |
| </p> |
| <nav aria-label="AEC-Bench links"> |
| <a href="https://arxiv.org/abs/2603.29199" target="_blank" rel="noreferrer">Paper</a> |
| <a href="https://github.com/TheodoreGalanos/aec-bench" target="_blank" rel="noreferrer">Code</a> |
| <a href="https://huggingface.co/datasets/aec-bench/release-model-rollouts" target="_blank" rel="noreferrer">Release dataset</a> |
| </nav> |
| <p class="footer"> |
| This organisation hosts datasets, rollout artifacts, and benchmark releases for the AEC-Bench project. |
| </p> |
| </main> |
| </body> |
| </html> |
|
|