"""Evaluation: CER/WER comparison + flagger metrics.

CER/WER for TrOCR raw vs TrOCR + post-correction, on IAM-GW and the
hand-labeled LoC holdout. Flagger ROC AUC + Brier score from the trained
model. Outputs a markdown table for the README.
"""

# TODO: implement main()