Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
Spaces:
Nomearod
/
agentbench
like
0
Running
App
Files
Files
Community
Fetching metadata from the HF Docker repository...
main
agentbench
/
agent_bench
/
evaluation
Ctrl+K
Ctrl+K
4 contributors
History:
34 commits
Nomearod
calibrate(jury): v1.1+v1.1.1 β fix weighting bugs; recency-position paraphrase clause
ab0e054
8 days ago
calibration
rubric: clarify groundedness reference scope (snippets-only) for v1.1 gold (#20)
9 days ago
datasets
feat(goldens): add source_snippets to 8 FastAPI calibration items
10 days ago
judges
calibrate(jury): v1.1+v1.1.1 β fix weighting bugs; recency-position paraphrase clause
8 days ago
rubrics
rubric: clarify groundedness reference scope (snippets-only) for v1.1 gold (#20)
9 days ago
variance
calibrate(jury): v1.1+v1.1.1 β fix weighting bugs; recency-position paraphrase clause
8 days ago
__init__.py
Safe
50 Bytes
feat: Day 4 β corpus, ingest script, first 10 golden questions
about 2 months ago
harness.py
Safe
8.92 kB
fix(types): four mypy errors blocking CI
10 days ago
metrics.py
Safe
4.84 kB
refactor(metrics): delete superseded LLM judges (answer_faithfulness etc.)
10 days ago
report.py
Safe
9.55 kB
fix(types): four mypy errors blocking CI
10 days ago