agentbench / tests /evaluation /test_jury_aggregation.py

Commit History

calibrate(jury): v1.1+v1.1.1 β€” fix weighting bugs; recency-position paraphrase clause
ab0e054

Nomearod Claude Opus 4.7 (1M context) commited on

fix(judges): four review-blocking bugs (review items 1–4 + 8)
9255fb5

Nomearod Claude Opus 4.7 (1M context) commited on

feat(variance): PermutedJudge + Jury β€” N permutations and multi-judge aggregator
c038a7d

Nomearod Claude Opus 4.7 (1M context) commited on