ResearchClawBench: A Benchmark for End-to-End Autonomous Scientific Research Paper • 2606.07591 • Published 29 days ago • 95
Beyond Accuracy: Unveiling Inefficiency Patterns in Tool-Integrated Reasoning Paper • 2604.05404 • Published Apr 7 • 45
From Perception to Action: An Interactive Benchmark for Vision Reasoning Paper • 2602.21015 • Published Feb 24 • 24
From Perception to Action: An Interactive Benchmark for Vision Reasoning Paper • 2602.21015 • Published Feb 24 • 24
InternAgent-1.5: A Unified Agentic Framework for Long-Horizon Autonomous Scientific Discovery Paper • 2602.08990 • Published Feb 9 • 79
Scaling Behaviors of LLM Reinforcement Learning Post-Training: An Empirical Study in Mathematical Reasoning Paper • 2509.25300 • Published Sep 29, 2025 • 8