OdysseyArena: Benchmarking Large Language Models For Long-Horizon, Active and Inductive Interactions Paper β’ 2602.05843 β’ Published Feb 5 β’ 61
TIDE: Trajectory-based Diagnostic Evaluation of Test-Time Improvement in LLM Agents Paper β’ 2602.02196 β’ Published Feb 2 β’ 35
3D-Aware Implicit Motion Control for View-Adaptive Human Video Generation Paper β’ 2602.03796 β’ Published Feb 3 β’ 64
A^3-Bench: Benchmarking Memory-Driven Scientific Reasoning via Anchor and Attractor Activation Paper β’ 2601.09274 β’ Published Jan 14 β’ 84
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices Paper β’ 2512.01374 β’ Published Dec 1, 2025 β’ 106
OS-Sentinel: Towards Safety-Enhanced Mobile GUI Agents via Hybrid Validation in Realistic Workflows Paper β’ 2510.24411 β’ Published Oct 28, 2025 β’ 73
MUR: Momentum Uncertainty guided Reasoning for Large Language Models Paper β’ 2507.14958 β’ Published Jul 20, 2025 β’ 47
ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows Paper β’ 2505.19897 β’ Published May 26, 2025 β’ 104
Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning Paper β’ 2504.08672 β’ Published Apr 11, 2025 β’ 55
Understanding R1-Zero-Like Training: A Critical Perspective Paper β’ 2503.20783 β’ Published Mar 26, 2025 β’ 59
MARS: A Multi-Agent Framework Incorporating Socratic Guidance for Automated Prompt Optimization Paper β’ 2503.16874 β’ Published Mar 21, 2025 β’ 45
MAPS: A Multi-Agent Framework Based on Big Seven Personality and Socratic Guidance for Multimodal Scientific Problem Solving Paper β’ 2503.16905 β’ Published Mar 21, 2025 β’ 54
GKG-LLM: A Unified Framework for Generalized Knowledge Graph Construction Paper β’ 2503.11227 β’ Published Mar 14, 2025 β’ 25
Ο-Decoding: Adaptive Foresight Sampling for Balanced Inference-Time Exploration and Exploitation Paper β’ 2503.13288 β’ Published Mar 17, 2025 β’ 51
CapArena: Benchmarking and Analyzing Detailed Image Captioning in the LLM Era Paper β’ 2503.12329 β’ Published Mar 16, 2025 β’ 27
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents Paper β’ 2410.23218 β’ Published Oct 30, 2024 β’ 49