calibrate(jury): v1.1+v1.1.1 β fix weighting bugs; recency-position paraphrase clause ab0e054 Nomearod Claude Opus 4.7 (1M context) commited on 5 days ago
fix(judges,calibration,harness): three Codex adversarial-review findings 226b6f4 Nomearod Claude Opus 4.7 (1M context) commited on 7 days ago
fix(judges,calibration): five review follow-ups (items 5, 6, 7, 9, 10) 71ec5e8 Nomearod Claude Opus 4.7 (1M context) commited on 7 days ago
fix(judges): four review-blocking bugs (review items 1β4 + 8) 9255fb5 Nomearod Claude Opus 4.7 (1M context) commited on 7 days ago
feat(config): add evaluation.judge_dimensions field 12cb8b7 Nomearod Claude Opus 4.7 (1M context) commited on 7 days ago
feat(calibration): generate_kappa_table with strict/warn modes 1d47106 Nomearod Claude Opus 4.7 (1M context) commited on 7 days ago
feat(scripts): run_calibration.py orchestrator for Steps A/C/D 4fa7c61 Nomearod Claude Opus 4.7 (1M context) commited on 7 days ago
test(calibration): sklearn-parity fixtures + cross-check CI test 3a2ed35 Nomearod Claude Opus 4.7 (1M context) commited on 7 days ago
feat(calibration): hand-rolled cohen_kappa, gwets_ac2, bootstrap_ci 6ef2e0e Nomearod Claude Opus 4.7 (1M context) commited on 7 days ago
feat(variance): PermutedJudge + Jury β N permutations and multi-judge aggregator c038a7d Nomearod Claude Opus 4.7 (1M context) commited on 7 days ago
feat(judges): CitationFaithfulnessJudge with all-or-nothing aggregation 04d9ea0 Nomearod Claude Opus 4.7 (1M context) commited on 7 days ago
feat(judges): CompletenessJudge + three-point reference-based rubric 80be2d8 Nomearod Claude Opus 4.7 (1M context) commited on 7 days ago
feat(judges): RelevanceJudge + anchored three-point rubric b170eb6 Nomearod Claude Opus 4.7 (1M context) commited on 7 days ago
feat(judges): GroundednessJudge + anchored binary rubric 30a5e0c Nomearod Claude Opus 4.7 (1M context) commited on 7 days ago
feat(judges): _call_judge_with_retry helper with strict-reprompt + abstain ff78845 Nomearod Claude Opus 4.7 (1M context) commited on 7 days ago
feat(judges): MockJudge with LookupError on missing keys aa70e89 Nomearod Claude Opus 4.7 (1M context) commited on 7 days ago
feat(judges): Judge ABC with judge_id derived from model + dimension 2192305 Nomearod Claude Opus 4.7 (1M context) commited on 7 days ago
feat(judges): Rubric markdown loader with aggressive validation 7b72b2c Nomearod Claude Opus 4.7 (1M context) commited on 7 days ago
feat(judges): ScoreResult + abstain-reason constants 76e370c Nomearod Claude Opus 4.7 (1M context) commited on 7 days ago
test: scaffold tests/evaluation/ directory for judge-layer tests f94cea7 Nomearod Claude Opus 4.7 (1M context) commited on 7 days ago