Spaces:

Nomearod
/

agentbench

Running

App Files Files Community

agentbench / agent_bench /evaluation /calibration

19.3 kB

Ctrl+K

Ctrl+K

4 contributors

History: 4 commits

Jane Yeung

rubric: clarify groundedness reference scope (snippets-only) for v1.1 gold (#20)

e16544c unverified 22 days ago

__init__.py

246 Bytes
feat(calibration): hand-rolled cohen_kappa, gwets_ac2, bootstrap_ci 23 days ago
metrics.py

5.75 kB
fix(judges,calibration): five review follow-ups (items 5, 6, 7, 9, 10) 23 days ago
report.py

13.3 kB
rubric: clarify groundedness reference scope (snippets-only) for v1.1 gold (#20) 22 days ago