driftcall / cells /step_22_summary.md
saumilyajj's picture
Upload folder using huggingface_hub
b43d8da verified
# Cell 22 β€” Markdown Summary Table (Baseline β†’ Final)
`print_summary_table(baseline, final)` returns the multi-section markdown
summary that ships in the HF blog and DESIGN.md Β§15 pitch:
1. **Per-reward** (mean + 95% CI) β€” baseline β†’ final β†’ paired Ξ” with CI.
2. **Per-language** β€” baseline reward_mean β†’ final β†’ Ξ”.
3. **Drift-detection latency** β€” Stage 2/3 p50/p95 before vs after.
4. **Reward-hacking offenses** β€” per-class baseline β†’ final counts.
**Contract:** evaluation.md Β§3.3, Β§3.4, Β§3.5; DESIGN.md Β§13 deliverables #6 / #7.
Numeric cells round to 3 decimals (latency to 2). Paired Ξ” pulled from
`final.breakdown['paired_ci']` (populated by `eval_final` in step_19).