agentbench / results
1.67 MB
Nomearod's picture
calibrate(jury): v1.1+v1.1.1 β€” fix weighting bugs; recency-position paraphrase clause
ab0e054