SkillEvolBench: Benchmarking the Evolution from Episodic Experience to Procedural Skills Paper • 2605.24117 • Published May 22 • 22
MMDeepResearch-Bench: A Benchmark for Multimodal Deep Research Agents Paper • 2601.12346 • Published Jan 18 • 52