Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up

Spaces:
Nomearod
/
agentbench
Running

App Files Files Community
Fetching metadata from the HF Docker repository...
agentbench / agent_bench
Ctrl+K
Ctrl+K
  • 4 contributors
History: 126 commits
Nomearod's picture
Nomearod
dashboard: add #harness + #harness-appendix sections (v3 design integration)
2d9ce3a 3 days ago
  • agents
    fix: batch-3 adversarial review findings 27 days ago
  • core
    fix(judges,calibration,harness): three Codex adversarial-review findings 5 days ago
  • evaluation
    calibrate(jury): v1.1+v1.1.1 β€” fix weighting bugs; recency-position paraphrase clause 4 days ago
  • langchain_baseline
    feat(eval): Week 1 step 5 β€” 25-question K8s golden dataset + grounded_refusal fix 25 days ago
  • memory
    feat: add SQLite conversation sessions with session_id about 2 months ago
  • rag
    feat: expose reranker scores through retrieval pipeline 29 days ago
  • security
    fix(audit): catch all write errors so audit failures can't crash requests 25 days ago
  • serving
    dashboard: add #harness + #harness-appendix sections (v3 design integration) 3 days ago
  • tools
    style: fix ruff lint β€” import sorting, line length 29 days ago
  • __init__.py
    92 Bytes
    fix: add __version__ to package init for CI smoke test about 2 months ago