Commit History

Merge remote-tracking branch 'origin/main' into hf-deploy
4158bba
Running

Nomearod commited on

dashboard: add #harness + #harness-appendix sections (v3 design integration)
2d9ce3a

Nomearod Claude Opus 4.7 (1M context) commited on

docs(judge): writeup draft v1 β€” methodology arc + position + v1.2 fix-list
c093a45

Nomearod Claude Opus 4.7 (1M context) commited on

calibrate(jury): 4A characterizes v1.1.1 residual as model-class-specific
504a35c

Nomearod Claude Opus 4.7 (1M context) commited on

calibrate(jury): v1.1+v1.1.1 β€” fix weighting bugs; recency-position paraphrase clause
ab0e054

Nomearod Claude Opus 4.7 (1M context) commited on

rubric: clarify groundedness reference scope (snippets-only) for v1.1 gold (#20)
e16544c
unverified

Jane Yeung Claude Opus 4.7 (1M context) commited on

fix(calibration): per-corpus dispatch in generate-outputs (#19)
ee729e0
unverified

Jane Yeung Claude Opus 4.7 (1M context) commited on

Merge pull request #18 from tyy0811/feat/judge-layer-v1
7ca889f
unverified

Jane Yeung commited on

fix(types): four mypy errors blocking CI
02b8717

Nomearod Claude Opus 4.7 (1M context) commited on

docs(harness,readme): two re-review must-fix items
c39d5c7

Nomearod Claude Opus 4.7 (1M context) commited on

fix(judges,calibration,harness): three Codex adversarial-review findings
226b6f4

Nomearod Claude Opus 4.7 (1M context) commited on

fix(judges,calibration): five review follow-ups (items 5, 6, 7, 9, 10)
71ec5e8

Nomearod Claude Opus 4.7 (1M context) commited on

fix(judges): four review-blocking bugs (review items 1–4 + 8)
9255fb5

Nomearod Claude Opus 4.7 (1M context) commited on

docs+build: judge-layer v1 coupled-artifact updates
508e5ef

Nomearod Claude Opus 4.7 (1M context) commited on

refactor(metrics): delete superseded LLM judges (answer_faithfulness etc.)
281b43d

Nomearod Claude Opus 4.7 (1M context) commited on

refactor(harness): migrate to per-dimension Judge layer (drop faithfulness/correctness)
e76227f

Nomearod Claude Opus 4.7 (1M context) commited on

feat(config): add evaluation.judge_dimensions field
12cb8b7

Nomearod Claude Opus 4.7 (1M context) commited on

feat(calibration): generate_kappa_table with strict/warn modes
1d47106

Nomearod Claude Opus 4.7 (1M context) commited on

feat(scripts): run_calibration.py orchestrator for Steps A/C/D
4fa7c61

Nomearod Claude Opus 4.7 (1M context) commited on

feat(calibration): six row configs for the ΞΊ ablation table
cf57f16

Nomearod Claude Opus 4.7 (1M context) commited on

feat(goldens): add source_snippets to 8 FastAPI calibration items
a48afb9

Nomearod Claude Opus 4.7 (1M context) commited on

feat(calibration): 30-item stratified calibration_v1 sample
8ef480a

Nomearod Claude Opus 4.7 (1M context) commited on

test(calibration): sklearn-parity fixtures + cross-check CI test
3a2ed35

Nomearod Claude Opus 4.7 (1M context) commited on

feat(calibration): hand-rolled cohen_kappa, gwets_ac2, bootstrap_ci
6ef2e0e

Nomearod Claude Opus 4.7 (1M context) commited on

feat(variance): PermutedJudge + Jury β€” N permutations and multi-judge aggregator
c038a7d

Nomearod Claude Opus 4.7 (1M context) commited on

feat(judges): CitationFaithfulnessJudge with all-or-nothing aggregation
04d9ea0

Nomearod Claude Opus 4.7 (1M context) commited on

feat(judges): CompletenessJudge + three-point reference-based rubric
80be2d8

Nomearod Claude Opus 4.7 (1M context) commited on

feat(judges): RelevanceJudge + anchored three-point rubric
b170eb6

Nomearod Claude Opus 4.7 (1M context) commited on

feat(judges): GroundednessJudge + anchored binary rubric
30a5e0c

Nomearod Claude Opus 4.7 (1M context) commited on

feat(judges): _call_judge_with_retry helper with strict-reprompt + abstain
ff78845

Nomearod Claude Opus 4.7 (1M context) commited on

feat(judges): MockJudge with LookupError on missing keys
aa70e89

Nomearod Claude Opus 4.7 (1M context) commited on

feat(judges): Judge ABC with judge_id derived from model + dimension
2192305

Nomearod Claude Opus 4.7 (1M context) commited on

feat(judges): Rubric markdown loader with aggressive validation
7b72b2c

Nomearod Claude Opus 4.7 (1M context) commited on

feat(judges): ScoreResult + abstain-reason constants
76e370c

Nomearod Claude Opus 4.7 (1M context) commited on

test: scaffold tests/evaluation/ directory for judge-layer tests
f94cea7

Nomearod Claude Opus 4.7 (1M context) commited on

ci: document zero-secret contract on test job with empty env block
86ddcb7

Nomearod Claude Opus 4.7 (1M context) commited on

chore(tooling): exclude scripts/_dev/ from ruff and mypy
d15fbd3

Nomearod Claude Opus 4.7 (1M context) commited on

docs(plans): judge-layer v1 implementation plan β€” 12 phases, ~50 tasks
171022a

Nomearod Claude Opus 4.7 (1M context) commited on

docs(plans): judge-layer v1 design β€” supersede continuous-scale judges with discrete-anchored 2-judge jury + ΞΊ-validated calibration
44c65d4

Nomearod Claude Opus 4.7 (1M context) commited on

docs(readme): correct test count 444 β†’ 443
0e96cb9

Nomearod Claude Opus 4.7 (1M context) commited on

Merge remote-tracking branch 'origin/main' into hf-deploy
4161c3e

Nomearod commited on

Merge pull request #17 from tyy0811/dashboard-v3
fcfd067
unverified

Jane Yeung commited on

Redesign landing page: paper+ink visual system, instrumented pipeline, OWASP badges
82b6725

Nomearod Claude Opus 4.7 (1M context) commited on

Merge remote-tracking branch 'origin/main' into hf-deploy
efffb61

Nomearod commited on

Merge pull request #16 from tyy0811/feat/chip-row-owasp-coverage-subtitle
a9409b2
unverified

Jane Yeung commited on

feat(landing): OWASP coverage subtitle + LLM05 tooltip on corpus chips
414c372

Nomearod Claude Opus 4.7 (1M context) commited on

Merge remote-tracking branch 'origin/main' into hf-deploy
63d835d

Nomearod commited on

Merge pull request #15 from tyy0811/docs/security-llm07-residual-risk
ddda523
unverified

Jane Yeung commited on

docs(security): LLM07 named residual risk β€” injection classifier coverage gap
13317a0

Nomearod Claude Opus 4.7 (1M context) commited on

Merge remote-tracking branch 'origin/main' into hf-deploy
c750a10

Nomearod commited on