Commit History

fix(types): four mypy errors blocking CI
02b8717

Nomearod Claude Opus 4.7 (1M context) commited on

refactor(harness): migrate to per-dimension Judge layer (drop faithfulness/correctness)
e76227f

Nomearod Claude Opus 4.7 (1M context) commited on

fix: comparison framing, mock-specific failure analysis, stale test counts
a29d68d

Nomearod Claude Opus 4.6 (1M context) commited on

fix: grounded refusal checks no-sources, reference_answer for judge, mock disclaimer
520796c

Nomearod Claude Opus 4.6 (1M context) commited on

fix: retrieval metrics use ranked sources, LLM judge wired, report complete
3d027cb

Nomearod Claude Opus 4.6 (1M context) commited on

feat: Day 7 — evaluation harness, metrics, report, expanded golden dataset
c378584

Nomearod Claude Opus 4.6 (1M context) commited on