File size: 644 Bytes
b43d8da
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
# Cell 19 — Final Evaluation (Post-Training LoRA)

`eval_final(checkpoint, ..., baseline=baseline_report)` runs the trained LoRA
on the **same** 50 paired episodes used by the baseline (evaluation.md §3.1)
and stores the paired-difference 95% CIs under
`EvalReport.breakdown['paired_ci']`.

**Contract:** evaluation.md §2.1, §3.1, §3.3, §3.8, §5 `EpisodeSetLeakError`.

- `EpisodeSetLeakError` raised at entry AND exit if `baseline.episode_ids ≠
  val/briefs.jsonl[0:50]` or the post-rollout report's IDs diverge.
- Paired bootstrap CI seed = `20260428` (evaluation.md §2.4).
- Wall-clock budget 20 min — same ceiling as baseline.