Spaces:
Running
Running
Commit History
calibrate(jury): v1.1+v1.1.1 β fix weighting bugs; recency-position paraphrase clause ab0e054
rubric: clarify groundedness reference scope (snippets-only) for v1.1 gold (#20) e16544c unverified
Jane Yeung Claude Opus 4.7 (1M context) commited on
docs+build: judge-layer v1 coupled-artifact updates 508e5ef
docs(decisions): promote cold-start falsified-assumption and audit-path incident entries, add three-regimes latency refinement 6409a40
docs(decisions): add entry on named residual risks and scope limits verdict discipline 2e8274b
docs+test: round-2 incident response β Google API key format scrub 4dc3e01
docs: incident response + SHA remap after credential-exposure history rewrite 168d3e1
docs: defer HF Space rename β outstanding applications reference current URL 5d4b3fe
docs: step 8.1 β tagline reframe + README honest-scope + rename closure 086ad86
feat(eval): K8s refusal_threshold sweep against 25Q set β 0.015 validated 2d1d822
docs: step 5 follow-up β parallel-tracks list + post-authoring observations 05bf702
feat(eval): Week 1 step 5 β 25-question K8s golden dataset + grounded_refusal fix 4454894
docs: Phase 1 gate closure + stale-wording corrections (cross-cutting #3) 23de799
docs(eval): Fix 2 SearchTool query expansion β attempted and reverted 27c2e17
docs(eval): Fix 1 counterfactual prompt clause β attempted and reverted 213da36
chore(eval): pin gpt-4o-mini snapshot + wire fastapi golden_dataset + pre-commit tolerances 5c1f49f
feat(eval): K8s refusal_threshold 0.02 β 0.015 empirical calibration 125dac0
feat(eval): K8s first pilot run + flavor-B empirical decisions 2439025
docs: decisions for multi-corpus refactor 361d65d
docs: add decisions for monitor mode, SSE events, vanilla JS 77e1875
docs: fix test count in Testing section, add auth decision, reorder entries 9a8ca07
docs: add security architecture section to README and DECISIONS.md f7bb777
feat: infrastructure sprint β vLLM/Modal, Helm, Terraform (#8) a9d4375
Jane Yeung Claude Opus 4.6 (1M context) commited on