Papers
arxiv:2601.15703

Agentic Uncertainty Quantification

Published on Jan 22
ยท Submitted by
Jiaxin Zhang
on Jan 23
Authors:
,
,
,

Abstract

A unified dual-process framework transforms verbalized uncertainty into active control signals for improved reasoning reliability in AI agents.

AI-generated summary

Although AI agents have demonstrated impressive capabilities in long-horizon reasoning, their reliability is severely hampered by the ``Spiral of Hallucination,'' where early epistemic errors propagate irreversibly. Existing methods face a dilemma: uncertainty quantification (UQ) methods typically act as passive sensors, only diagnosing risks without addressing them, while self-reflection mechanisms suffer from continuous or aimless corrections. To bridge this gap, we propose a unified Dual-Process Agentic UQ (AUQ) framework that transforms verbalized uncertainty into active, bi-directional control signals. Our architecture comprises two complementary mechanisms: System 1 (Uncertainty-Aware Memory, UAM), which implicitly propagates verbalized confidence and semantic explanations to prevent blind decision-making; and System 2 (Uncertainty-Aware Reflection, UAR), which utilizes these explanations as rational cues to trigger targeted inference-time resolution only when necessary. This enables the agent to balance efficient execution and deep deliberation dynamically. Extensive experiments on closed-loop benchmarks and open-ended deep research tasks demonstrate that our training-free approach achieves superior performance and trajectory-level calibration. We believe this principled framework AUQ represents a significant step towards reliable agents.

Community

Paper author Paper submitter

๐Ÿ›‘ Stop the "Spiral of Hallucination" in Autonomous Agents!

Long-horizon agents often fail because minor early errors snowball into irreversible failures. We introduce Agentic Uncertainty Quantification (AUQ), a training-free Dual-Process framework inspired by System 1/System 2 thinking:

  • ๐Ÿง  System 1 (Fast): Uncertainty-Aware Memory propagates doubt to prevent blind commitment.
  • ๐Ÿค” System 2 (Slow): Triggers active reflection only when confidence drops below a specific threshold.

Key Wins:

  • โœ… SOTA Performance: Outperforms ReAct & Reflexion on ALFWorld, WebShop, and the new DeepResearch Bench.
  • โœ… Efficiency: Prevents long, futile failure loops, making it more token-efficient than standard methods.
  • โœ… Plug-and-Play: No fine-tuning required.

From "Passive Diagnosis" to "Active Control" โ€” make your agents reliable! ๐Ÿš€

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2601.15703 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2601.15703 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2601.15703 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.