| title: AEC-Bench | |
| sdk: static | |
| # AEC-Bench | |
| [AEC-Bench](https://github.com/TheodoreGalanos/aec-bench) is an open benchmark and Python toolkit for evaluating agentic AI systems on realistic Architecture, Engineering, and Construction tasks. | |
| The project combines generated engineering tasks, executable verifiers, model rollout ledgers, and trace artifacts so evaluation can be inspected beyond a single leaderboard score: by task family, difficulty, information visibility, tool use, cost, and failure mode. | |
| - Paper: [AEC-Bench: A Multimodal Benchmark for Agentic Systems in Architecture, Engineering, and Construction](https://arxiv.org/abs/2603.29199) | |
| - Code: [TheodoreGalanos/aec-bench](https://github.com/TheodoreGalanos/aec-bench) | |
| - First release dataset: [aec-bench/release-model-rollouts](https://huggingface.co/datasets/aec-bench/release-model-rollouts) | |
| This organisation hosts datasets, rollout artifacts, and benchmark releases for the AEC-Bench project. | |