"""Evaluation: CER/WER comparison + flagger metrics. CER/WER for TrOCR raw vs TrOCR + post-correction, on IAM-GW and the hand-labeled LoC holdout. Flagger ROC AUC + Brier score from the trained model. Outputs a markdown table for the README. """ # TODO: implement main()