CHI-Bench: Can AI Agents Automate End-to-End, Long-Horizon, Policy-Rich Healthcare Workflows? Paper • 2605.16679 • Published 5 days ago • 38
SkillOrchestra: Learning to Route Agents via Skill Transfer Paper • 2602.19672 • Published Feb 23 • 58
ChartMuseum: Testing Visual Reasoning Capabilities of Large Vision-Language Models Paper • 2505.13444 • Published May 19, 2025 • 16