Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up

Spaces:
Nomearod
/
agentbench
Running

App Files Files Community
Fetching metadata from the HF Docker repository...
agentbench / agent_bench /evaluation
Ctrl+K
Ctrl+K
  • 4 contributors
History: 34 commits
Nomearod's picture
Nomearod
calibrate(jury): v1.1+v1.1.1 β€” fix weighting bugs; recency-position paraphrase clause
ab0e054 8 days ago
  • calibration
    rubric: clarify groundedness reference scope (snippets-only) for v1.1 gold (#20) 9 days ago
  • datasets
    feat(goldens): add source_snippets to 8 FastAPI calibration items 10 days ago
  • judges
    calibrate(jury): v1.1+v1.1.1 β€” fix weighting bugs; recency-position paraphrase clause 8 days ago
  • rubrics
    rubric: clarify groundedness reference scope (snippets-only) for v1.1 gold (#20) 9 days ago
  • variance
    calibrate(jury): v1.1+v1.1.1 β€” fix weighting bugs; recency-position paraphrase clause 8 days ago
  • __init__.py
    50 Bytes
    feat: Day 4 β€” corpus, ingest script, first 10 golden questions about 2 months ago
  • harness.py
    8.92 kB
    fix(types): four mypy errors blocking CI 10 days ago
  • metrics.py
    4.84 kB
    refactor(metrics): delete superseded LLM judges (answer_faithfulness etc.) 10 days ago
  • report.py
    9.55 kB
    fix(types): four mypy errors blocking CI 10 days ago