HalluGuard: Demystifying Data-Driven and Reasoning-Driven Hallucinations in LLMs
Abstract
A theoretical framework and detection method for identifying hallucinations in large language models by analyzing data-driven and reasoning-driven components through neural tangent kernel-based scoring.
The reliability of Large Language Models (LLMs) in high-stakes domains such as healthcare, law, and scientific discovery is often compromised by hallucinations. These failures typically stem from two sources: data-driven hallucinations and reasoning-driven hallucinations. However, existing detection methods usually address only one source and rely on task-specific heuristics, limiting their generalization to complex scenarios. To overcome these limitations, we introduce the Hallucination Risk Bound, a unified theoretical framework that formally decomposes hallucination risk into data-driven and reasoning-driven components, linked respectively to training-time mismatches and inference-time instabilities. This provides a principled foundation for analyzing how hallucinations emerge and evolve. Building on this foundation, we introduce HalluGuard, an NTK-based score that leverages the induced geometry and captured representations of the NTK to jointly identify data-driven and reasoning-driven hallucinations. We evaluate HalluGuard on 10 diverse benchmarks, 11 competitive baselines, and 9 popular LLM backbones, consistently achieving state-of-the-art performance in detecting diverse forms of LLM hallucinations.
Community
π HalluGuard: Demystifying Data-Driven and Reasoning-Driven Hallucinations in LLMs
Accepted at ICLR 2026
In this work, we introduce HalluGuard, a unified, theory-driven framework for hallucination detection in large language models, accepted at ICLR 2026.
Rather than treating hallucination as a single failure mode, HalluGuard explicitly decomposes hallucinations into data-driven and reasoning-driven componentsβand detects both at inference time, with no retraining, no labels, and no external references.
π Key Takeaways
π§ Two Sources of Hallucination
LLM hallucinations arise from two fundamentally different mechanisms:
Data-driven hallucinations
Errors rooted in biased, incomplete, or mismatched knowledge acquired during pretraining or finetuning.Reasoning-driven hallucinations
Errors caused by instability and error amplification during multi-step autoregressive decoding.
Most existing detectors focus on only one of these. HalluGuard shows that real hallucinations often emerge from their interaction and evolve across decoding steps.
π Hallucination Risk Bound (Theory)
We introduce a Hallucination Risk Bound, which formally decomposes total hallucination risk into:
- a representation bias term (training-time mismatch), and
- a decoding instability term (inference-time amplification).
The analysis reveals a key insight:
hallucinations originate from semantic approximation gaps and are then exponentially amplified during long-horizon generation.
This provides a principled explanation of how hallucinations emerge and evolve in LLMs.
π HalluGuard Score (Method)
Building on this theory, we propose HalluGuard, a lightweight NTK-based hallucination score:
Higher HalluGuard score β lower hallucination risk.
π Strong Empirical Results
We evaluate HalluGuard across:
- 10 benchmarks (QA, math reasoning, instruction following),
- 11 competitive baselines, and
- 9 LLM backbones (from GPT-2 to 70B-scale models).
Results:
- π Consistent state-of-the-art AUROC / AUPRC across all task families
- π Especially strong gains on multi-step reasoning benchmarks (MATH-500, BBH)
- π§© Robust detection of fine-grained semantic hallucinations (PAWS), even when surface forms are nearly identical
π§ Beyond Detection: Test-Time Guidance
HalluGuard can also be used to guide test-time inference, significantly improving reasoning accuracy by steering generation away from unstable trajectoriesβwithout modifying or retraining the model.
π Takeaway
HalluGuard (ICLR 2026) provides:
- a theoretical lens for understanding how hallucinations emerge and evolve, and
- a practical, plug-and-play detector for modern LLMs.
It bridges representation geometry and decoding dynamics, offering a unified foundation for reliable reasoning and uncertainty-aware inference.
Feedback and discussion are very welcome π
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper