Spaces:

DGXAI
/

driftcall

Runtime error

App Files Files Community

driftcall / cells /step_19_eval_final.md

saumilyajj's picture

Upload folder using huggingface_hub

b43d8da verified 22 days ago

|

history blame contribute delete

644 Bytes

	# Cell 19 — Final Evaluation (Post-Training LoRA)

	`eval_final(checkpoint, ..., baseline=baseline_report)` runs the trained LoRA
	on the same 50 paired episodes used by the baseline (evaluation.md §3.1)
	and stores the paired-difference 95% CIs under
	`EvalReport.breakdown['paired_ci']`.

	Contract: evaluation.md §2.1, §3.1, §3.3, §3.8, §5 `EpisodeSetLeakError`.

	- `EpisodeSetLeakError` raised at entry AND exit if `baseline.episode_ids ≠
	val/briefs.jsonl[0:50]` or the post-rollout report's IDs diverge.
	- Paired bootstrap CI seed = `20260428` (evaluation.md §2.4).
	- Wall-clock budget 20 min — same ceiling as baseline.