agentbench / configs

Commit History

calibrate(jury): v1.1+v1.1.1 β€” fix weighting bugs; recency-position paraphrase clause
ab0e054

Nomearod Claude Opus 4.7 (1M context) commited on

fix(judges,calibration,harness): three Codex adversarial-review findings
226b6f4

Nomearod Claude Opus 4.7 (1M context) commited on

feat(calibration): six row configs for the ΞΊ ablation table
cf57f16

Nomearod Claude Opus 4.7 (1M context) commited on

feat(eval): K8s refusal_threshold sweep against 25Q set β€” 0.015 validated
2d1d822

Nomearod Claude Opus 4.6 (1M context) commited on

feat(eval): Week 1 step 5 β€” 25-question K8s golden dataset + grounded_refusal fix
4454894

Nomearod Claude Opus 4.6 (1M context) commited on

chore(eval): pin gpt-4o-mini snapshot + wire fastapi golden_dataset + pre-commit tolerances
5c1f49f

Nomearod Claude Opus 4.6 (1M context) commited on

feat(eval): K8s refusal_threshold 0.02 β†’ 0.015 empirical calibration
125dac0

Nomearod Claude Opus 4.6 (1M context) commited on

feat: K8s pilot corpus β€” 8 pages + config entry + JSON rewrite
ce7247c

Nomearod Claude Opus 4.6 (1M context) commited on

fix: batch-3 adversarial review findings
42c7303

Nomearod commited on

feat: K8s corpus config entry, ingestion target, curation policy
3c0089e

Nomearod commited on

feat(security): add security config models to AppConfig
4717d76

Nomearod Claude Opus 4.6 (1M context) commited on

feat: infrastructure sprint β€” vLLM/Modal, Helm, Terraform (#8)
a9d4375

Jane Yeung Claude Opus 4.6 (1M context) commited on

feat: Anthropic Haiku benchmark + README with provider comparison
ade4c8b

Nomearod Claude Opus 4.6 (1M context) commited on

feat: add SQLite conversation sessions with session_id
9874438

Nomearod Claude Opus 4.6 (1M context) commited on

fix: production config with reranker disabled for 512MB free tier
74aca65

Nomearod Claude Opus 4.6 (1M context) commited on

chore: surface retry and rate_limit_rpm config in default.yaml
f7dd169

Nomearod Claude Opus 4.6 (1M context) commited on

feat: add cross-encoder reranking with feature flag
65d5480

Nomearod Claude Opus 4.6 (1M context) commited on

feat: add grounded refusal gate based on retrieval score threshold
c410788

Nomearod Claude Opus 4.6 (1M context) commited on

feat: Day 1 β€” repo scaffolding, provider abstraction, config, tests
ef5d585

Nomearod Claude Opus 4.6 (1M context) commited on