AdaPlanBench: Evaluating Adaptive Planning in Large Language Model Agents under World and User Constraints Paper • 2606.05622 • Published 6 days ago • 40
Training Long-Context Vision-Language Models Effectively with Generalization Beyond 128K Context Paper • 2605.13831 • Published 28 days ago • 87
NAACL: Noise-AwAre Verbal Confidence Calibration for LLMs in RAG Systems Paper • 2601.11004 • Published Jan 16 • 30
Uni-MoE-2.0-Omni: Scaling Language-Centric Omnimodal Large Model with Advanced MoE, Training and Data Paper • 2511.12609 • Published Nov 16, 2025 • 107
CostBench: Evaluating Multi-Turn Cost-Optimal Planning and Adaptation in Dynamic Environments for LLM Tool-Use Agents Paper • 2511.02734 • Published Nov 4, 2025 • 22
UniMoE-Audio: Unified Speech and Music Generation with Dynamic-Capacity MoE Paper • 2510.13344 • Published Oct 15, 2025 • 65
NewtonBench: Benchmarking Generalizable Scientific Law Discovery in LLM Agents Paper • 2510.07172 • Published Oct 8, 2025 • 28
VerIPO: Cultivating Long Reasoning in Video-LLMs via Verifier-Gudied Iterative Policy Optimization Paper • 2505.19000 • Published May 25, 2025 • 42
MMLongBench: Benchmarking Long-Context Vision-Language Models Effectively and Thoroughly Paper • 2505.10610 • Published May 15, 2025 • 56
Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models Paper • 2505.04921 • Published May 8, 2025 • 187