feat(eval): Week 1 step 5 — 25-question K8s golden dataset + grounded_refusal fix 4454894 Nomearod Claude Opus 4.6 (1M context) commited on Apr 14
fix: grounded refusal checks no-sources, reference_answer for judge, mock disclaimer 520796c Nomearod Claude Opus 4.6 (1M context) commited on Mar 24
fix: retrieval metrics use ranked sources, LLM judge wired, report complete 3d027cb Nomearod Claude Opus 4.6 (1M context) commited on Mar 24