AgentForesight: Online Auditing for Early Failure Prediction in Multi-Agent Systems Paper • 2605.08715 • Published 4 days ago • 4
Dive into the Agent Matrix: A Realistic Evaluation of Self-Replication Risk in LLM Agents Paper • 2509.25302 • Published Sep 29, 2025 • 1
CoT-UQ: Improving Response-wise Uncertainty Quantification in LLMs with Chain-of-Thought Paper • 2502.17214 • Published Feb 24, 2025 • 1