driftcall / cells /step_22_summary.md
saumilyajj's picture
Upload folder using huggingface_hub
b43d8da verified

Cell 22 β€” Markdown Summary Table (Baseline β†’ Final)

print_summary_table(baseline, final) returns the multi-section markdown summary that ships in the HF blog and DESIGN.md Β§15 pitch:

  1. Per-reward (mean + 95% CI) β€” baseline β†’ final β†’ paired Ξ” with CI.
  2. Per-language β€” baseline reward_mean β†’ final β†’ Ξ”.
  3. Drift-detection latency β€” Stage 2/3 p50/p95 before vs after.
  4. Reward-hacking offenses β€” per-class baseline β†’ final counts.

Contract: evaluation.md Β§3.3, Β§3.4, Β§3.5; DESIGN.md Β§13 deliverables #6 / #7. Numeric cells round to 3 decimals (latency to 2). Paired Ξ” pulled from final.breakdown['paired_ci'] (populated by eval_final in step_19).