agentbench / docs /plans

Commit History

docs(plans): judge-layer v1 implementation plan — 12 phases, ~50 tasks
171022a

Nomearod Claude Opus 4.7 (1M context) commited on

docs(plans): judge-layer v1 design — supersede continuous-scale judges with discrete-anchored 2-judge jury + κ-validated calibration
44c65d4

Nomearod Claude Opus 4.7 (1M context) commited on

docs(plan): add Part A OWASP mapping implementation plan
ad918a7

Nomearod Claude Opus 4.6 (1M context) commited on

docs(plan): Part A design self-review fixes (LLM02 consistency, anti-padding template, paired-review gate)
cc8331d

Nomearod Claude Opus 4.6 (1M context) commited on

docs(plan): add Part A OWASP LLM Top 10 (2025) mapping design
7c08d23

Nomearod Claude Opus 4.6 (1M context) commited on

docs: multi-corpus refactor implementation plan
a5fc1f3

Nomearod Claude Opus 4.6 (1M context) commited on

docs: multi-corpus refactor design
31f0ada

Nomearod Claude Opus 4.6 (1M context) commited on

style: fix ruff lint — import sorting, line length
12a17f8

Nomearod Claude Opus 4.6 (1M context) commited on