Abstract
A unified dual-process framework transforms verbalized uncertainty into active control signals for improved reasoning reliability in AI agents.
Although AI agents have demonstrated impressive capabilities in long-horizon reasoning, their reliability is severely hampered by the ``Spiral of Hallucination,'' where early epistemic errors propagate irreversibly. Existing methods face a dilemma: uncertainty quantification (UQ) methods typically act as passive sensors, only diagnosing risks without addressing them, while self-reflection mechanisms suffer from continuous or aimless corrections. To bridge this gap, we propose a unified Dual-Process Agentic UQ (AUQ) framework that transforms verbalized uncertainty into active, bi-directional control signals. Our architecture comprises two complementary mechanisms: System 1 (Uncertainty-Aware Memory, UAM), which implicitly propagates verbalized confidence and semantic explanations to prevent blind decision-making; and System 2 (Uncertainty-Aware Reflection, UAR), which utilizes these explanations as rational cues to trigger targeted inference-time resolution only when necessary. This enables the agent to balance efficient execution and deep deliberation dynamically. Extensive experiments on closed-loop benchmarks and open-ended deep research tasks demonstrate that our training-free approach achieves superior performance and trajectory-level calibration. We believe this principled framework AUQ represents a significant step towards reliable agents.
Community
๐ Stop the "Spiral of Hallucination" in Autonomous Agents!
Long-horizon agents often fail because minor early errors snowball into irreversible failures. We introduce Agentic Uncertainty Quantification (AUQ), a training-free Dual-Process framework inspired by System 1/System 2 thinking:
- ๐ง System 1 (Fast): Uncertainty-Aware Memory propagates doubt to prevent blind commitment.
- ๐ค System 2 (Slow): Triggers active reflection only when confidence drops below a specific threshold.
Key Wins:
- โ SOTA Performance: Outperforms ReAct & Reflexion on ALFWorld, WebShop, and the new DeepResearch Bench.
- โ Efficiency: Prevents long, futile failure loops, making it more token-efficient than standard methods.
- โ Plug-and-Play: No fine-tuning required.
From "Passive Diagnosis" to "Active Control" โ make your agents reliable! ๐
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- EpiCaR: Knowing What You Don't Know Matters for Better Reasoning in LLMs (2026)
- Aligning Agentic World Models via Knowledgeable Experience Learning (2026)
- Analyzing Reasoning Consistency in Large Multimodal Models under Cross-Modal Conflicts (2026)
- CLEANER: Self-Purified Trajectories Boost Agentic Reinforcement Learning (2026)
- SPIRAL: Symbolic LLM Planning via Grounded and Reflective Search (2025)
- Reflecting with Two Voices: A Co-Adaptive Dual-Strategy Framework for LLM-Based Agent Decision Making (2025)
- EvoFSM: Controllable Self-Evolution for Deep Research with Finite State Machines (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper