Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Website
Tasks
HuggingChat
Collections
Languages
Organizations
Community
Blog
Posts
Daily Papers
Learn
Discord
Forum
GitHub
Solutions
Team & Enterprise
Hugging Face PRO
Enterprise Support
Inference Providers
Inference Endpoints
Storage Buckets
Log In
Sign Up
Spaces:
Nomearod
/
agentbench
like
0
Running
App
Files
Files
Community
Fetching metadata from the HF Docker repository...
main
agentbench
4.67 MB
Ctrl+K
Ctrl+K
4 contributors
History:
246 commits
Nomearod
Merge remote-tracking branch 'origin/main' into hf-deploy
4158bba
about 2 months ago
.github
ci: document zero-secret contract on test job with empty env block
about 2 months ago
agent_bench
dashboard: add #harness + #harness-appendix sections (v3 design integration)
about 2 months ago
configs
calibrate(jury): v1.1+v1.1.1 β fix weighting bugs; recency-position paraphrase clause
about 2 months ago
data
docs: step 5 follow-up β parallel-tracks list + post-authoring observations
2 months ago
docker
feat: infrastructure sprint β vLLM/Modal, Helm, Terraform (#8)
3 months ago
docs
docs(judge): writeup draft v1 β methodology arc + position + v1.2 fix-list
about 2 months ago
k8s
feat: infrastructure sprint β vLLM/Modal, Helm, Terraform (#8)
3 months ago
measurements
calibrate(jury): 4A characterizes v1.1.1 residual as model-class-specific
about 2 months ago
modal
feat(security): add Modal DeBERTa injection classifier deployment
3 months ago
results
calibrate(jury): v1.1+v1.1.1 β fix weighting bugs; recency-position paraphrase clause
about 2 months ago
scripts
calibrate(jury): 4A characterizes v1.1.1 residual as model-class-specific
about 2 months ago
terraform
feat: infrastructure sprint β vLLM/Modal, Helm, Terraform (#8)
3 months ago
tests
calibrate(jury): v1.1+v1.1.1 β fix weighting bugs; recency-position paraphrase clause
about 2 months ago
.dockerignore
Safe
211 Bytes
feat: Day 9 β Docker deployment with Dockerfile and docker-compose
3 months ago
.gitignore
1.03 kB
rubric: clarify groundedness reference scope (snippets-only) for v1.1 gold (#20)
about 2 months ago
DECISIONS.md
156 kB
calibrate(jury): 4A characterizes v1.1.1 residual as model-class-specific
about 2 months ago
Dockerfile
1.25 kB
fix(docker): create and chown logs/ for runtime audit writes
2 months ago
Makefile
3.5 kB
docs+build: judge-layer v1 coupled-artifact updates
about 2 months ago
README.md
17 kB
Merge remote-tracking branch 'origin/main' into hf-deploy
about 2 months ago
SECURITY.md
6.9 kB
docs(security): LLM07 named residual risk β injection classifier coverage gap
2 months ago
pyproject.toml
1.37 kB
chore(tooling): exclude scripts/_dev/ from ruff and mypy
about 2 months ago