arxiv:2601.18753

HalluGuard: Demystifying Data-Driven and Reasoning-Driven Hallucinations in LLMs

Published on Jan 26

· Submitted by

Xinyue Zeng on Jan 27

Upvote

Authors:

Xinyue Zeng ,

Yujun Yan ,

Liang Shi ,

Dawei Zhou

Abstract

A theoretical framework and detection method for identifying hallucinations in large language models by analyzing data-driven and reasoning-driven components through neural tangent kernel-based scoring.

AI-generated summary

The reliability of Large Language Models (LLMs) in high-stakes domains such as healthcare, law, and scientific discovery is often compromised by hallucinations. These failures typically stem from two sources: data-driven hallucinations and reasoning-driven hallucinations. However, existing detection methods usually address only one source and rely on task-specific heuristics, limiting their generalization to complex scenarios. To overcome these limitations, we introduce the Hallucination Risk Bound, a unified theoretical framework that formally decomposes hallucination risk into data-driven and reasoning-driven components, linked respectively to training-time mismatches and inference-time instabilities. This provides a principled foundation for analyzing how hallucinations emerge and evolve. Building on this foundation, we introduce HalluGuard, an NTK-based score that leverages the induced geometry and captured representations of the NTK to jointly identify data-driven and reasoning-driven hallucinations. We evaluate HalluGuard on 10 diverse benchmarks, 11 competitive baselines, and 9 popular LLM backbones, consistently achieving state-of-the-art performance in detecting diverse forms of LLM hallucinations.

View arXiv page View PDF Add to collection

Community

xyzeng2000

Paper author Paper submitter about 10 hours ago

•

edited about 10 hours ago

🚀 HalluGuard: Demystifying Data-Driven and Reasoning-Driven Hallucinations in LLMs
Accepted at ICLR 2026

In this work, we introduce HalluGuard, a unified, theory-driven framework for hallucination detection in large language models, accepted at ICLR 2026.
Rather than treating hallucination as a single failure mode, HalluGuard explicitly decomposes hallucinations into data-driven and reasoning-driven components—and detects both at inference time, with no retraining, no labels, and no external references.

😆 Key Takeaways

🧠 Two Sources of Hallucination

LLM hallucinations arise from two fundamentally different mechanisms:

Data-driven hallucinations
Errors rooted in biased, incomplete, or mismatched knowledge acquired during pretraining or finetuning.
Reasoning-driven hallucinations
Errors caused by instability and error amplification during multi-step autoregressive decoding.

Most existing detectors focus on only one of these. HalluGuard shows that real hallucinations often emerge from their interaction and evolve across decoding steps.

📐 Hallucination Risk Bound (Theory)

We introduce a Hallucination Risk Bound, which formally decomposes total hallucination risk into:

a representation bias term (training-time mismatch), and
a decoding instability term (inference-time amplification).

The analysis reveals a key insight:
hallucinations originate from semantic approximation gaps and are then exponentially amplified during long-horizon generation.

This provides a principled explanation of how hallucinations emerge and evolve in LLMs.

🔍 HalluGuard Score (Method)

Building on this theory, we propose HalluGuard, a lightweight NTK-based hallucination score:

$\mathrm{HALLUGUARD}(u_h) = \det(K) + \log \sigma_{\max} - \log\!\big(\kappa(K)^2\big).$

Higher HalluGuard score ⇒ lower hallucination risk.

📊 Strong Empirical Results

We evaluate HalluGuard across:

10 benchmarks (QA, math reasoning, instruction following),
11 competitive baselines, and
9 LLM backbones (from GPT-2 to 70B-scale models).

Results:

🏆 Consistent state-of-the-art AUROC / AUPRC across all task families
🔍 Especially strong gains on multi-step reasoning benchmarks (MATH-500, BBH)
🧩 Robust detection of fine-grained semantic hallucinations (PAWS), even when surface forms are nearly identical

🧭 Beyond Detection: Test-Time Guidance

HalluGuard can also be used to guide test-time inference, significantly improving reasoning accuracy by steering generation away from unstable trajectories—without modifying or retraining the model.

🔑 Takeaway

HalluGuard (ICLR 2026) provides:

a theoretical lens for understanding how hallucinations emerge and evolve, and
a practical, plug-and-play detector for modern LLMs.

It bridges representation geometry and decoding dynamics, offering a unified foundation for reliable reasoning and uncertainty-aware inference.

Feedback and discussion are very welcome 🙌

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2601.18753 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2601.18753 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2601.18753 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.