CoVEBench: Can Video Editing Models Handle Complex Instructions? Paper • 2606.08415 • Published 26 days ago • 51
Where Do Deep-Research Agents Go Wrong? Span-Level Error Localization in Agent Trajectories Paper • 2606.02060 • Published Jun 1 • 57