Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
Spaces:
Nomearod
/
agentbench
like
0
Running
App
Files
Files
Community
Fetching metadata from the HF Docker repository...
main
agentbench
Ctrl+K
Ctrl+K
4 contributors
History:
246 commits
Nomearod
Merge remote-tracking branch 'origin/main' into hf-deploy
4158bba
about 23 hours ago
.github
ci: document zero-secret contract on test job with empty env block
3 days ago
agent_bench
dashboard: add #harness + #harness-appendix sections (v3 design integration)
about 24 hours ago
configs
calibrate(jury): v1.1+v1.1.1 β fix weighting bugs; recency-position paraphrase clause
1 day ago
data
docs: step 5 follow-up β parallel-tracks list + post-authoring observations
23 days ago
docker
feat: infrastructure sprint β vLLM/Modal, Helm, Terraform (#8)
about 1 month ago
docs
docs(judge): writeup draft v1 β methodology arc + position + v1.2 fix-list
1 day ago
k8s
feat: infrastructure sprint β vLLM/Modal, Helm, Terraform (#8)
about 1 month ago
measurements
calibrate(jury): 4A characterizes v1.1.1 residual as model-class-specific
1 day ago
modal
feat(security): add Modal DeBERTa injection classifier deployment
about 1 month ago
results
calibrate(jury): v1.1+v1.1.1 β fix weighting bugs; recency-position paraphrase clause
1 day ago
scripts
calibrate(jury): 4A characterizes v1.1.1 residual as model-class-specific
1 day ago
terraform
feat: infrastructure sprint β vLLM/Modal, Helm, Terraform (#8)
about 1 month ago
tests
calibrate(jury): v1.1+v1.1.1 β fix weighting bugs; recency-position paraphrase clause
1 day ago
.dockerignore
Safe
211 Bytes
feat: Day 9 β Docker deployment with Dockerfile and docker-compose
about 1 month ago
.gitignore
Safe
1.03 kB
rubric: clarify groundedness reference scope (snippets-only) for v1.1 gold (#20)
2 days ago
DECISIONS.md
Safe
156 kB
calibrate(jury): 4A characterizes v1.1.1 residual as model-class-specific
1 day ago
Dockerfile
Safe
1.25 kB
fix(docker): create and chown logs/ for runtime audit writes
22 days ago
Makefile
Safe
3.5 kB
docs+build: judge-layer v1 coupled-artifact updates
3 days ago
README.md
Safe
17 kB
Merge remote-tracking branch 'origin/main' into hf-deploy
about 23 hours ago
SECURITY.md
Safe
6.9 kB
docs(security): LLM07 named residual risk β injection classifier coverage gap
15 days ago
pyproject.toml
Safe
1.37 kB
chore(tooling): exclude scripts/_dev/ from ruff and mypy
3 days ago