Can LLMs Predict Their Own Failures? Self-Awareness via Internal Circuits Paper • 2512.20578 • Published Dec 23, 2025 • 86
DeepResearchEval: An Automated Framework for Deep Research Task Construction and Agentic Evaluation Paper • 2601.09688 • Published Jan 14 • 126
Entropy-Adaptive Fine-Tuning: Resolving Confident Conflicts to Mitigate Forgetting Paper • 2601.02151 • Published Jan 5 • 109
Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem Paper • 2512.24873 • Published Dec 31, 2025 • 105
Improving Multi-step RAG with Hypergraph-based Memory for Long-Context Complex Relational Modeling Paper • 2512.23959 • Published Dec 30, 2025 • 112
Youtu-Agent: Scaling Agent Productivity with Automated Generation and Hybrid Policy Optimization Paper • 2512.24615 • Published Dec 31, 2025 • 119
NeoVerse: Enhancing 4D World Model with in-the-wild Monocular Videos Paper • 2601.00393 • Published Jan 1 • 131
Youtu-LLM: Unlocking the Native Agentic Potential for Lightweight Large Language Models Paper • 2512.24618 • Published Dec 31, 2025 • 151
Thinking with Map: Reinforced Parallel Map-Augmented Agent for Geolocalization Paper • 2601.05432 • Published Jan 8 • 166
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization Paper • 2601.05242 • Published Jan 8 • 225
Watching, Reasoning, and Searching: A Video Deep Research Benchmark on Open Web for Agentic Video Reasoning Paper • 2601.06943 • Published Jan 11 • 211
MiroThinker: Pushing the Performance Boundaries of Open-Source Research Agents via Model, Context, and Interactive Scaling Paper • 2511.11793 • Published Nov 14, 2025 • 187
ROOT: Robust Orthogonalized Optimizer for Neural Network Training Paper • 2511.20626 • Published Nov 25, 2025 • 43
Lumine: An Open Recipe for Building Generalist Agents in 3D Open Worlds Paper • 2511.08892 • Published Nov 12, 2025 • 209