Data Journalist Agent: Transforming Data into Verifiable Multimodal Stories Paper • 2606.11176 • Published 17 days ago • 126
ResearchClawBench: A Benchmark for End-to-End Autonomous Scientific Research Paper • 2606.07591 • Published 29 days ago • 95
Computer-Use Agents as Judges for Generative User Interface Paper • 2511.15567 • Published Nov 19, 2025 • 54
Grounding Computer Use Agents on Human Demonstrations Paper • 2511.07332 • Published Nov 10, 2025 • 107
Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs Paper • 2506.14245 • Published Jun 17, 2025 • 45
VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation Paper • 2511.02778 • Published Nov 4, 2025 • 104
From Charts to Code: A Hierarchical Benchmark for Multimodal Models Paper • 2510.17932 • Published Oct 20, 2025 • 8
From Charts to Code: A Hierarchical Benchmark for Multimodal Models Paper • 2510.17932 • Published Oct 20, 2025 • 8 • 2
Paper2Video: Automatic Video Generation from Scientific Papers Paper • 2510.05096 • Published Oct 6, 2025 • 120
LongLive: Real-time Interactive Long Video Generation Paper • 2509.22622 • Published Sep 26, 2025 • 189
BaseReward: A Strong Baseline for Multimodal Reward Model Paper • 2509.16127 • Published Sep 19, 2025 • 21