Spaces:

Nomearod
/

agentbench

Running

App Files Files Community

agentbench / agent_bench /evaluation

168 kB

Ctrl+K

Ctrl+K

4 contributors

History: 35 commits

Nomearod's picture

Deploy: dashboard naive-vs-rigorous reveal + sync serving to origin/main

1c08abf about 9 hours ago

calibration
rubric: clarify groundedness reference scope (snippets-only) for v1.1 gold (#20) about 2 months ago
datasets
feat(goldens): add source_snippets to 8 FastAPI calibration items about 2 months ago
judges
Deploy: dashboard naive-vs-rigorous reveal + sync serving to origin/main about 9 hours ago
rubrics
rubric: clarify groundedness reference scope (snippets-only) for v1.1 gold (#20) about 2 months ago
variance
calibrate(jury): v1.1+v1.1.1 — fix weighting bugs; recency-position paraphrase clause about 2 months ago
__init__.py

50 Bytes
feat: Day 4 — corpus, ingest script, first 10 golden questions 3 months ago
harness.py

8.92 kB
fix(types): four mypy errors blocking CI about 2 months ago
metrics.py

5.24 kB
Deploy: dashboard naive-vs-rigorous reveal + sync serving to origin/main about 9 hours ago
report.py

9.55 kB
fix(types): four mypy errors blocking CI about 2 months ago