Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
Spaces:
Nomearod
/
agentbench
like
0
Running
App
Files
Files
Community
Fetching metadata from the HF Docker repository...
main
agentbench
/
docs
/
plans
780 kB
Ctrl+K
Ctrl+K
4 contributors
History:
8 commits
Nomearod
docs(plans): judge-layer v1 implementation plan β 12 phases, ~50 tasks
171022a
5 days ago
2026-03-24-day1-repo-provider.md
Safe
31.8 kB
style: fix ruff lint β import sorting, line length
28 days ago
2026-03-24-v2-implementation-plan.md
Safe
11.8 kB
style: fix ruff lint β import sorting, line length
28 days ago
2026-03-25-v2-revised-design.md
Safe
18.8 kB
style: fix ruff lint β import sorting, line length
28 days ago
2026-03-27-langchain-baseline.md
Safe
40.3 kB
style: fix ruff lint β import sorting, line length
28 days ago
2026-03-30-infra-sprint-design.md
Safe
23.8 kB
style: fix ruff lint β import sorting, line length
28 days ago
2026-03-30-infra-sprint-implementation.md
Safe
54.3 kB
style: fix ruff lint β import sorting, line length
28 days ago
2026-03-31-security-hardening-design.md
Safe
14.5 kB
style: fix ruff lint β import sorting, line length
28 days ago
2026-03-31-security-hardening-implementation.md
Safe
70.9 kB
style: fix ruff lint β import sorting, line length
28 days ago
2026-04-10-showcase-ui-design.md
Safe
17.4 kB
style: fix ruff lint β import sorting, line length
28 days ago
2026-04-10-sse-stage-events-implementation.md
Safe
54.4 kB
style: fix ruff lint β import sorting, line length
28 days ago
2026-04-12-multi-corpus-refactor-design.md
Safe
20.9 kB
docs: multi-corpus refactor design
27 days ago
2026-04-12-multi-corpus-refactor-implementation.md
Safe
52.4 kB
docs: multi-corpus refactor implementation plan
27 days ago
2026-04-15-owasp-llm-top-10-mapping-design.md
Safe
39 kB
docs(plan): Part A design self-review fixes (LLM02 consistency, anti-padding template, paired-review gate)
24 days ago
2026-04-15-owasp-llm-top-10-mapping-implementation.md
Safe
58.7 kB
docs(plan): add Part A OWASP mapping implementation plan
24 days ago
2026-05-04-judge-layer-v1-design.md
Safe
46.5 kB
docs(plans): judge-layer v1 design β supersede continuous-scale judges with discrete-anchored 2-judge jury + ΞΊ-validated calibration
5 days ago
2026-05-04-judge-layer-v1-implementation.md
Safe
225 kB
docs(plans): judge-layer v1 implementation plan β 12 phases, ~50 tasks
5 days ago