Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up

Spaces:
Nomearod
/
agentbench
Running

App Files Files Community
Fetching metadata from the HF Docker repository...
agentbench / scripts
78.2 kB
Ctrl+K
Ctrl+K
  • 4 contributors
History: 19 commits
Nomearod's picture
Nomearod
calibrate(jury): 4A characterizes v1.1.1 residual as model-class-specific
504a35c 1 day ago
  • _dev
    calibrate(jury): 4A characterizes v1.1.1 residual as model-class-specific 1 day ago
  • benchmark.py
    1.84 kB
    feat: Day 7 β€” evaluation harness, metrics, report, expanded golden dataset about 1 month ago
  • evaluate.py
    6.24 kB
    feat: evaluate.py --corpus flag + CorpusConfig.golden_dataset 24 days ago
  • ingest.py
    3.99 kB
    fix(ingest): exclude QUESTION_PLAN.md from corpus ingestion 23 days ago
  • run_calibration.py
    22.2 kB
    calibrate(jury): v1.1+v1.1.1 β€” fix weighting bugs; recency-position paraphrase clause 1 day ago
  • run_langchain_eval.py
    5.67 kB
    fix: deferred imports, match iteration budget, token cost tracking about 1 month ago
  • verify_retrieval.py
    3.6 kB
    feat: add reproducible retrieval gate check with committed artifact about 1 month ago