Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up

Spaces:
Nomearod
/
agentbench
Running

App Files Files Community
Fetching metadata from the HF Docker repository...
agentbench / tests /evaluation
74.2 kB
Ctrl+K
Ctrl+K
  • 4 contributors
History: 20 commits
Nomearod's picture
Nomearod
calibrate(jury): v1.1+v1.1.1 β€” fix weighting bugs; recency-position paraphrase clause
ab0e054 3 days ago
  • fixtures
    fix(judges): four review-blocking bugs (review items 1–4 + 8) 4 days ago
  • __init__.py
    0 Bytes
    test: scaffold tests/evaluation/ directory for judge-layer tests 4 days ago
  • test_calibration_metrics.py
    5.98 kB
    fix(judges,calibration): five review follow-ups (items 5, 6, 7, 9, 10) 4 days ago
  • test_calibration_report.py
    9.2 kB
    fix(judges,calibration): five review follow-ups (items 5, 6, 7, 9, 10) 4 days ago
  • test_harness_migration.py
    6.74 kB
    fix(judges,calibration,harness): three Codex adversarial-review findings 4 days ago
  • test_judges.py
    32.5 kB
    calibrate(jury): v1.1+v1.1.1 β€” fix weighting bugs; recency-position paraphrase clause 3 days ago
  • test_jury_aggregation.py
    10.9 kB
    calibrate(jury): v1.1+v1.1.1 β€” fix weighting bugs; recency-position paraphrase clause 3 days ago
  • test_rubric_loading.py
    4.4 kB
    fix(judges): four review-blocking bugs (review items 1–4 + 8) 4 days ago