feat: evaluate.py --corpus flag + CorpusConfig.golden_dataset 68d96ea Nomearod Claude Opus 4.6 (1M context) commited on 24 days ago
feat: add cross-encoder reranking with feature flag 65d5480 Nomearod Claude Opus 4.6 (1M context) commited on Mar 25
feat: add grounded refusal gate based on retrieval score threshold c410788 Nomearod Claude Opus 4.6 (1M context) commited on Mar 25
fix: retrieval metrics use ranked sources, LLM judge wired, report complete 3d027cb Nomearod Claude Opus 4.6 (1M context) commited on Mar 24
feat: Day 7 — evaluation harness, metrics, report, expanded golden dataset c378584 Nomearod Claude Opus 4.6 (1M context) commited on Mar 24