NatureBench: Can Coding Agents Match the Published SOTA of Nature-Family Papers? Paper • 2606.24530 • Published 3 days ago • 54
view article Article PhysicsIntern: from an Autonomous Benchmark-runner to a Research Sidekick dlouapre • 14 days ago • 6
Running 54 physics-intern: an Autonomous Agent for Physics Research 📝 54 Explore an autonomous AI workflow for physics research
ResearchClawBench: A Benchmark for End-to-End Autonomous Scientific Research Paper • 2606.07591 • Published 29 days ago • 95
COLLEAGUE.SKILL: Automated AI Skill Generation via Expert Knowledge Distillation Paper • 2605.31264 • Published 28 days ago • 118
AutoResearchClaw: Self-Reinforcing Autonomous Research with Human-AI Collaboration Paper • 2605.20025 • Published May 19 • 190
Teaching Thinking Models to Reason with Tools: A Full-Pipeline Recipe for Tool-Integrated Reasoning Paper • 2605.06326 • Published May 7 • 26
Plan Then Action:High-Level Planning Guidance Reinforcement Learning for LLM Reasoning Paper • 2510.01833 • Published Oct 2, 2025
QCBench: Evaluating Large Language Models on Domain-Specific Quantitative Chemistry Paper • 2508.01670 • Published Aug 3, 2025
$δ$-mem: Efficient Online Memory for Large Language Models Paper • 2605.12357 • Published May 12 • 131
PRBench: End-to-end Paper Reproduction in Physics Research Paper • 2603.27646 • Published Mar 29 • 29
Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale Paper • 2603.25040 • Published Mar 26 • 134
LongCat-Flash-Prover: Advancing Native Formal Reasoning via Agentic Tool-Integrated Reinforcement Learning Paper • 2603.21065 • Published Mar 22 • 78