agentbench / README.md

Commit History

Merge remote-tracking branch 'origin/main' into hf-deploy
4158bba
Running

Nomearod commited on

docs(harness,readme): two re-review must-fix items
c39d5c7

Nomearod Claude Opus 4.7 (1M context) commited on

docs+build: judge-layer v1 coupled-artifact updates
508e5ef

Nomearod Claude Opus 4.7 (1M context) commited on

docs(readme): correct test count 444 β†’ 443
0e96cb9

Nomearod Claude Opus 4.7 (1M context) commited on

Merge origin/main into hf-deploy: Part A (OWASP) + cold-wake README
46bacd4

Nomearod commited on

docs: update README cold-start timing to measured values (~2 min cold / ~5s warm)
dc08a6b

Nomearod Claude Opus 4.7 (1M context) commited on

docs(readme): link SECURITY.md OWASP mapping from Security Architecture tail
76a1458

Nomearod Claude Opus 4.6 (1M context) commited on

docs(readme): update cold-wake number with measured range
4894eb3

Nomearod Claude Opus 4.6 (1M context) commited on

deploy(hf): inject HF Spaces frontmatter into README for hf-deploy branch
6955d72

Nomearod Claude Opus 4.6 (1M context) commited on

docs: defer HF Space rename β€” outstanding applications reference current URL
5d4b3fe

Nomearod Claude Opus 4.6 (1M context) commited on

docs: step 8.1 β€” tagline reframe + README honest-scope + rename closure
086ad86

Nomearod Claude Opus 4.6 (1M context) commited on

docs: sharpen zero-hallucination claim, explain Mistral-7B row
2293da9

Nomearod Claude Opus 4.6 (1M context) commited on

docs: fix test count in Testing section, add auth decision, reorder entries
9a8ca07

Nomearod Claude Opus 4.6 (1M context) commited on

docs: refine README per review β€” scope boundary, config block, DECISIONS link
379b29a

Nomearod Claude Opus 4.6 (1M context) commited on

docs: add security architecture section to README and DECISIONS.md
f7bb777

Nomearod Claude Opus 4.6 (1M context) commited on

docs: sharpen README narrative for clarity
f0224d3

Nomearod Claude Opus 4.6 (1M context) commited on

feat: infrastructure sprint β€” vLLM/Modal, Helm, Terraform (#8)
a9d4375

Jane Yeung Claude Opus 4.6 (1M context) commited on

docs: restructure README for clarity
8875eea

Nomearod Claude Opus 4.6 (1M context) commited on

fix: comparison framing, mock-specific failure analysis, stale test counts
a29d68d

Nomearod Claude Opus 4.6 (1M context) commited on

docs: add langchain baseline comparison to README
b1863d1

Nomearod Claude Opus 4.6 (1M context) commited on

docs: add provider comparison report (OpenAI vs Anthropic Haiku)
3e490c9

Nomearod Claude Opus 4.6 (1M context) commited on

Revert "fix: restore HF Spaces frontmatter"
71f2996

Nomearod commited on

fix: restore HF Spaces frontmatter
2863f68

Nomearod commited on

Clean up README by removing metadata
c108bba

Jane Yeung commited on

feat: Anthropic Haiku benchmark + README with provider comparison
ade4c8b

Nomearod Claude Opus 4.6 (1M context) commited on

feat: implement Anthropic Claude provider
077b821

Nomearod Claude Opus 4.6 (1M context) commited on

feat: add SQLite conversation sessions with session_id
9874438

Nomearod Claude Opus 4.6 (1M context) commited on

fix: update live demo URL to actual HF Spaces deployment
6e00d4e

Nomearod Claude Opus 4.6 (1M context) commited on

feat: switch deployment to Hugging Face Spaces (16GB free tier)
cd0c04f

Nomearod Claude Opus 4.6 (1M context) commited on

feat: Render deployment config, startup warmup, README update
55218a1

Nomearod Claude Opus 4.6 (1M context) commited on

feat: upgrade GitHub Actions CI with pip cache, type check, Docker build
c43b4b0

Nomearod Claude Opus 4.6 (1M context) commited on

fix: move gpt-4o-mini to benchmark context, highlight provider abstraction
36c307c

Nomearod Claude Opus 4.6 (1M context) commited on

feat: rewrite README for recruiter impact
9fd0c7a

Nomearod Claude Opus 4.6 (1M context) commited on

feat: sharpen README for recruiter readability
9b56692

Nomearod Claude Opus 4.6 (1M context) commited on

feat: real benchmark numbers from OpenAI gpt-4o-mini evaluation
3407aff

Nomearod Claude Opus 4.6 (1M context) commited on

fix: README dataset breakdown matches actual corpus, quick start uses make targets
57bea98

Nomearod Claude Opus 4.6 (1M context) commited on

feat: Day 8 β€” README with architecture, API docs, eval guide + DECISIONS.md
7920a16

Nomearod Claude Opus 4.6 (1M context) commited on