arxiv:2601.15703

Agentic Uncertainty Quantification

Published on Jan 22

· Submitted by

Jiaxin Zhang on Jan 23

Salesforce

Upvote

Authors:

Jiaxin Zhang ,

Abstract

A unified dual-process framework transforms verbalized uncertainty into active control signals for improved reasoning reliability in AI agents.

AI-generated summary

Although AI agents have demonstrated impressive capabilities in long-horizon reasoning, their reliability is severely hampered by the ``Spiral of Hallucination,'' where early epistemic errors propagate irreversibly. Existing methods face a dilemma: uncertainty quantification (UQ) methods typically act as passive sensors, only diagnosing risks without addressing them, while self-reflection mechanisms suffer from continuous or aimless corrections. To bridge this gap, we propose a unified Dual-Process Agentic UQ (AUQ) framework that transforms verbalized uncertainty into active, bi-directional control signals. Our architecture comprises two complementary mechanisms: System 1 (Uncertainty-Aware Memory, UAM), which implicitly propagates verbalized confidence and semantic explanations to prevent blind decision-making; and System 2 (Uncertainty-Aware Reflection, UAR), which utilizes these explanations as rational cues to trigger targeted inference-time resolution only when necessary. This enables the agent to balance efficient execution and deep deliberation dynamically. Extensive experiments on closed-loop benchmarks and open-ended deep research tasks demonstrate that our training-free approach achieves superior performance and trajectory-level calibration. We believe this principled framework AUQ represents a significant step towards reliable agents.

View arXiv page View PDF Add to collection

Community

zhangjiaxin2012

Paper author Paper submitter about 11 hours ago

🛑 Stop the "Spiral of Hallucination" in Autonomous Agents!

Long-horizon agents often fail because minor early errors snowball into irreversible failures. We introduce Agentic Uncertainty Quantification (AUQ), a training-free Dual-Process framework inspired by System 1/System 2 thinking:

🧠 System 1 (Fast): Uncertainty-Aware Memory propagates doubt to prevent blind commitment.
🤔 System 2 (Slow): Triggers active reflection only when confidence drops below a specific threshold.

Key Wins:

✅ SOTA Performance: Outperforms ReAct & Reflexion on ALFWorld, WebShop, and the new DeepResearch Bench.
✅ Efficiency: Prevents long, futile failure loops, making it more token-efficient than standard methods.
✅ Plug-and-Play: No fine-tuning required.

From "Passive Diagnosis" to "Active Control" — make your agents reliable! 🚀

librarian-bot

about 5 hours ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2601.15703 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2601.15703 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2601.15703 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.