Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
Spaces:
Nomearod
/
agentbench
like
0
Running
App
Files
Files
Community
Fetching metadata from the HF Docker repository...
main
agentbench
/
scripts
78.2 kB
Ctrl+K
Ctrl+K
4 contributors
History:
19 commits
Nomearod
calibrate(jury): 4A characterizes v1.1.1 residual as model-class-specific
504a35c
1 day ago
_dev
calibrate(jury): 4A characterizes v1.1.1 residual as model-class-specific
1 day ago
benchmark.py
Safe
1.84 kB
feat: Day 7 β evaluation harness, metrics, report, expanded golden dataset
about 1 month ago
evaluate.py
Safe
6.24 kB
feat: evaluate.py --corpus flag + CorpusConfig.golden_dataset
24 days ago
ingest.py
Safe
3.99 kB
fix(ingest): exclude QUESTION_PLAN.md from corpus ingestion
23 days ago
run_calibration.py
Safe
22.2 kB
calibrate(jury): v1.1+v1.1.1 β fix weighting bugs; recency-position paraphrase clause
1 day ago
run_langchain_eval.py
Safe
5.67 kB
fix: deferred imports, match iteration budget, token cost tracking
about 1 month ago
verify_retrieval.py
Safe
3.6 kB
feat: add reproducible retrieval gate check with committed artifact
about 1 month ago