SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks Paper โข 2602.12670 โข Published Feb 13 โข 60
ClawsBench: Evaluating Capability and Safety of LLM Productivity Agents in Simulated Workspaces Paper โข 2604.05172 โข Published Apr 6 โข 24