Papers
arxiv:2601.18753

HalluGuard: Demystifying Data-Driven and Reasoning-Driven Hallucinations in LLMs

Published on Jan 26
Β· Submitted by
Xinyue Zeng
on Jan 27
Authors:
,
,
,

Abstract

A theoretical framework and detection method for identifying hallucinations in large language models by analyzing data-driven and reasoning-driven components through neural tangent kernel-based scoring.

AI-generated summary

The reliability of Large Language Models (LLMs) in high-stakes domains such as healthcare, law, and scientific discovery is often compromised by hallucinations. These failures typically stem from two sources: data-driven hallucinations and reasoning-driven hallucinations. However, existing detection methods usually address only one source and rely on task-specific heuristics, limiting their generalization to complex scenarios. To overcome these limitations, we introduce the Hallucination Risk Bound, a unified theoretical framework that formally decomposes hallucination risk into data-driven and reasoning-driven components, linked respectively to training-time mismatches and inference-time instabilities. This provides a principled foundation for analyzing how hallucinations emerge and evolve. Building on this foundation, we introduce HalluGuard, an NTK-based score that leverages the induced geometry and captured representations of the NTK to jointly identify data-driven and reasoning-driven hallucinations. We evaluate HalluGuard on 10 diverse benchmarks, 11 competitive baselines, and 9 popular LLM backbones, consistently achieving state-of-the-art performance in detecting diverse forms of LLM hallucinations.

Community

Paper author Paper submitter
β€’
edited about 10 hours ago

πŸš€ HalluGuard: Demystifying Data-Driven and Reasoning-Driven Hallucinations in LLMs
Accepted at ICLR 2026

In this work, we introduce HalluGuard, a unified, theory-driven framework for hallucination detection in large language models, accepted at ICLR 2026.
Rather than treating hallucination as a single failure mode, HalluGuard explicitly decomposes hallucinations into data-driven and reasoning-driven componentsβ€”and detects both at inference time, with no retraining, no labels, and no external references.


πŸ˜† Key Takeaways


🧠 Two Sources of Hallucination

LLM hallucinations arise from two fundamentally different mechanisms:

  • Data-driven hallucinations
    Errors rooted in biased, incomplete, or mismatched knowledge acquired during pretraining or finetuning.

  • Reasoning-driven hallucinations
    Errors caused by instability and error amplification during multi-step autoregressive decoding.

Most existing detectors focus on only one of these. HalluGuard shows that real hallucinations often emerge from their interaction and evolve across decoding steps.


πŸ“ Hallucination Risk Bound (Theory)

We introduce a Hallucination Risk Bound, which formally decomposes total hallucination risk into:

  • a representation bias term (training-time mismatch), and
  • a decoding instability term (inference-time amplification).

The analysis reveals a key insight:
hallucinations originate from semantic approximation gaps and are then exponentially amplified during long-horizon generation.

This provides a principled explanation of how hallucinations emerge and evolve in LLMs.


πŸ” HalluGuard Score (Method)

Building on this theory, we propose HalluGuard, a lightweight NTK-based hallucination score:

HALLUGUARD(uh)=det⁑(K)+log⁑σmaxβ‘βˆ’log⁑ ⁣(ΞΊ(K)2). \mathrm{HALLUGUARD}(u_h) = \det(K) + \log \sigma_{\max} - \log\!\big(\kappa(K)^2\big).

Higher HalluGuard score β‡’ lower hallucination risk.


πŸ“Š Strong Empirical Results

We evaluate HalluGuard across:

  • 10 benchmarks (QA, math reasoning, instruction following),
  • 11 competitive baselines, and
  • 9 LLM backbones (from GPT-2 to 70B-scale models).

Results:

  • πŸ† Consistent state-of-the-art AUROC / AUPRC across all task families
  • πŸ” Especially strong gains on multi-step reasoning benchmarks (MATH-500, BBH)
  • 🧩 Robust detection of fine-grained semantic hallucinations (PAWS), even when surface forms are nearly identical

🧭 Beyond Detection: Test-Time Guidance

HalluGuard can also be used to guide test-time inference, significantly improving reasoning accuracy by steering generation away from unstable trajectoriesβ€”without modifying or retraining the model.


πŸ”‘ Takeaway

HalluGuard (ICLR 2026) provides:

  • a theoretical lens for understanding how hallucinations emerge and evolve, and
  • a practical, plug-and-play detector for modern LLMs.

It bridges representation geometry and decoding dynamics, offering a unified foundation for reliable reasoning and uncertainty-aware inference.

Feedback and discussion are very welcome πŸ™Œ

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2601.18753 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2601.18753 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2601.18753 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.