Spaces:

Nomearod
/

agentbench

Running

App Files Files Community

agentbench / scripts /benchmark.py

Commit History

feat: Day 7 — evaluation harness, metrics, report, expanded golden dataset

c378584

Nomearod Claude Opus 4.6 (1M context) commited on Mar 24