Spaces:
Running
Running
Commit History
calibrate(jury): v1.1+v1.1.1 β fix weighting bugs; recency-position paraphrase clause ab0e054
rubric: clarify groundedness reference scope (snippets-only) for v1.1 gold (#20) e16544c unverified
Jane Yeung Claude Opus 4.7 (1M context) commited on
docs(plans): judge-layer v1 implementation plan β 12 phases, ~50 tasks 171022a
docs(plans): judge-layer v1 design β supersede continuous-scale judges with discrete-anchored 2-judge jury + ΞΊ-validated calibration 44c65d4
docs(plan): add Part A OWASP mapping implementation plan ad918a7
docs(plan): Part A design self-review fixes (LLM02 consistency, anti-padding template, paired-review gate) cc8331d
docs(plan): add Part A OWASP LLM Top 10 (2025) mapping design 7c08d23
docs: multi-corpus refactor implementation plan a5fc1f3
docs: multi-corpus refactor design 31f0ada
style: fix ruff lint β import sorting, line length 12a17f8
docs: add known limitations and future work for self-hosted benchmark 79e4ae8
docs: deepen self-hosted analysis in provider comparison 04cb97f
feat: infrastructure sprint β vLLM/Modal, Helm, Terraform (#8) a9d4375
Jane Yeung Claude Opus 4.6 (1M context) commited on