agentbench / tests /evaluation /test_harness_migration.py

Commit History

fix(judges,calibration,harness): three Codex adversarial-review findings
226b6f4

Nomearod Claude Opus 4.7 (1M context) commited on

fix(judges): four review-blocking bugs (review items 1–4 + 8)
9255fb5

Nomearod Claude Opus 4.7 (1M context) commited on

feat(config): add evaluation.judge_dimensions field
12cb8b7

Nomearod Claude Opus 4.7 (1M context) commited on