% Auto-generated by eval/inject_validity.py — do not edit by hand. \subsection{Validity of the Injected Slice}\label{app:inject-validity} Following the TableEG-style audit, we classify every error cell (dirty vs.\ gold) with a deterministic taxonomy and compare the suite's injected errors (money-table seeds 7/17/27, $n=43{,}011$) against the $163{,}607$ real errors across the 42 paired sources (hospital's 509 included). \begin{table}[t]\centering\small \caption{Error-type distributions, real vs.\ injected (pooled).} \label{tab:inject-validity} \begin{tabular}{lrr}\toprule error type & real & injected \\ \midrule typo & 0.386 & 0.454 \\ case & 0.009 & 0.214 \\ whitespace & 0.009 & 0.333 \\ encoding & 0.004 & 0.000 \\ numeric & 0.061 & 0.000 \\ date-format & 0.000 & 0.000 \\ token-swap & 0.000 & 0.000 \\ missing & 0.032 & 0.000 \\ other & 0.500 & 0.000 \\ \bottomrule\end{tabular}\end{table} The injector covers only the recoverable surface classes it targets by design (typo/case/whitespace; injector--taxonomy agreement 0.997), whereas real errors are dominated by substitutions beyond edit distance~2 (other, 0.500) and short typos (0.386), with numeric (0.061), missing-value (0.032), and encoding classes the injector never produces. Pooled Jensen--Shannon divergence is 0.526~bits (per-source median 0.398, range 0.212--1.000; hospital 0.398): the two slices are \emph{not} interchangeable, which is why the paper reports them separately and localizes the grounding claim in the real slice. Ranking preservation is partial: Kendall $\tau_b$ between system rankings on the injected vs.\ real F1 slices is $0.33$ over the four cross-system rows and $0.80$ with the degenerate anchors (abstain-all, random-edit, oracle) included. The injected slice preserves the floor/ceiling ordering but ranks OpenRefine fingerprint above both our system and OpenRefine kNN, the reverse of the real slice --- frequency clustering looks strong exactly where the canonical form is present and dominant by construction. Injected-only evaluation would therefore overstate frequency-clustering baselines.