Papers
arxiv:2606.16541

The Faithfulness Gap: Certifying Semantic Equivalence Between Natural-Language and Formal Mathematical Statements

Published on Jun 15
Authors:
,

Abstract

Bidirectional Provability Fingerprinting framework certifies faithfulness in autoformalization by analyzing consequence neighborhoods and using novel components like counterfactual probe generation and adaptive budget allocation.

Autoformalization, translating natural-language mathematics into formal proof assistants, is bottlenecked not by translation fluency but by faithfulness: a formal statement can typecheck and be provable, yet still encode a different theorem than the source intended. We introduce Bidirectional Provability Fingerprinting (), a framework that certifies faithfulness by characterizing each candidate through its forward and backward consequence neighborhoods in the ambient theory and matching these against probes derived from the natural-language statement. We further introduce four novel components: (i) Counterfactual Probe Generation (), a contrastive procedure that synthesizes probes targeting specific drift directions; (ii) the Equivalence Spectrum, a continuous faithfulness score that replaces brittle binary verdicts; (iii) Adaptive Probe Budget Allocation (), an information-theoretic budget router; and (iv) Faithfulness-Guided Decoding (), which uses signals as a reward during autoformalization. We prove a drift detection theorem and a PAC-faithfulness result establishing that the equivalence class of a natural language statement is learnable from O(log(1/δ)/varepsilon) probes under mild assumptions. We release , a benchmark of 2{,}183 NL/Lean~4 pairs with controlled drift labels across six subfields of mathlib4. \,+\, detects 89.6% of drifted formalizations at a 3.0% false-positive rate-against 41.2% for typecheck and 63.3% for LLM-judge baselines, and reduces the rate at which a state-of-the-art autoformalizer emits drifted statements by 47%. https://pmlrbd.github.io/BPF/

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.16541 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.16541 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.16541 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.