arxiv:2603.11749

Compression Favors Consistency, Not Truth: When and Why Language Models Prefer Correct Information

Published on Mar 12

· Submitted by

Konstantin Krestnikov on Mar 16

Upvote

Authors:

Konstantin Krestnikov

Abstract

Language models exhibit truth bias due to compression pressure and internal consistency preferences rather than inherent truth-seeking, as demonstrated through controlled synthetic datasets with varying error structures.

AI-generated summary

Why do language models sometimes prefer correct statements even when trained on mixed-quality data? We introduce the Compression--Consistency Principle: next-token prediction favors hypotheses that allow shorter and more internally consistent descriptions of the training data. Truth bias emerges only when false alternatives are structurally harder to compress. We test this using small GPT-2-style character-level transformers (3.5M--86M parameters) on synthetic math corpora with controlled mixtures of correct and incorrect rules. In the random-error setting, models strongly prefer correct completions in paired evaluation: 83.1% accuracy at balanced data and 67.0% even when correct rules appear in only 10% of the corpus. Replacing random errors with a coherent but mathematically incorrect rule system largely eliminates the preference (near-chance accuracy). In a more natural-language-like synthetic world, the effect is weaker but still present (57.7%). Additional experiments show that embedding verification steps can restore preference for correctness even at small scale, while increasing the number of consistent rules produces a graded improvement in accuracy. Our results suggest that what appears as a "truth bias" is largely a side effect of compression pressure and preference for internal consistency, rather than an intrinsic drive toward truth. Full code and data are available at https://github.com/Rai220/compression-drives-truth.

View arXiv page View PDF GitHub 0 Add to collection

Community

krestnikov

Paper author Paper submitter about 4 hours ago

What if LLMs don’t actually love truth — they simply love whatever compresses better??

we deliberately poisoned the training data with contradictions (correct + deliberately wrong answers to the same questions) and discovered something fascinating:

random contradictions -> strong scaling truth bias (65% -> 85% with larger models)
pne coherent false rule system -> truth bias almost completely disappears (~50%)
Two competing false rule systems -> dramatic phase transition back to 78–88% truth preference

This suggests that the famous "inductive bias toward truth" is largely an artifact of compression. Truth wins because it’s typically the shortest and most consistent way to describe the data.
Introducing the >> Compression-Consistency Principle << with important implications for hallucinations, alignment, and synthetic data generation.

Full paper + code and experiments:
https://github.com/Rai220/compression-drives-truth/blob/master/paper_v2.md

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2603.11749 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2603.11749 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.