agentbench / scripts /run_calibration.py

Commit History

calibrate(jury): v1.1+v1.1.1 — fix weighting bugs; recency-position paraphrase clause
ab0e054

Nomearod Claude Opus 4.7 (1M context) commited on

rubric: clarify groundedness reference scope (snippets-only) for v1.1 gold (#20)
e16544c
unverified

Jane Yeung Claude Opus 4.7 (1M context) commited on

fix(calibration): per-corpus dispatch in generate-outputs (#19)
ee729e0
unverified

Jane Yeung Claude Opus 4.7 (1M context) commited on

fix(judges,calibration,harness): three Codex adversarial-review findings
226b6f4

Nomearod Claude Opus 4.7 (1M context) commited on

fix(judges,calibration): five review follow-ups (items 5, 6, 7, 9, 10)
71ec5e8

Nomearod Claude Opus 4.7 (1M context) commited on

feat(scripts): run_calibration.py orchestrator for Steps A/C/D
4fa7c61

Nomearod Claude Opus 4.7 (1M context) commited on