SciSafeEval: A Comprehensive Benchmark for Safety Alignment of Large Language Models in Scientific Tasks Paper • 2410.03769 • Published Oct 2, 2024
Boosting LLM's Molecular Structure Elucidation with Knowledge Enhanced Tree Search Reasoning Paper • 2506.23056 • Published Jun 29, 2025
A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems Paper • 2508.07407 • Published Aug 10, 2025 • 99
InnoEval: On Research Idea Evaluation as a Knowledge-Grounded, Multi-Perspective Reasoning Problem Paper • 2602.14367 • Published Feb 16 • 17
AgentSearchBench: A Benchmark for AI Agent Search in the Wild Paper • 2604.22436 • Published 12 days ago • 13