SkillCoach: Self-Evolving Rubrics for Evaluating and Enhancing Agentic Skill-Use Paper • 2607.01874 • Published 2 days ago • 12
A Benchmark and Framework for Evaluating Next Action Predictions in Spreadsheets Paper • 2606.13802 • Published 23 days ago • 1
Gamma-World: Generative Multi-Agent World Modeling Beyond Two Players Paper • 2605.28816 • Published May 27 • 431
Video2GUI: Synthesizing Large-Scale Interaction Trajectories for Generalized GUI Agent Pretraining Paper • 2605.14747 • Published May 14 • 147
KoichiYasuoka/gpt2-small-japanese-wikipedia-juman-ud-causal Token Classification • Updated May 21 • 2 • 1
SkillsVote: Lifecycle Governance of Agent Skills from Collection, Recommendation to Evolution Paper • 2605.18401 • Published May 18 • 130
PASA: A Principled Embedding-Space Watermarking Approach for LLM-Generated Text under Semantic-Invariant Attacks Paper • 2605.10977 • Published May 9 • 10
WildClawBench: A Benchmark for Real-World, Long-Horizon Agent Evaluation Paper • 2605.10912 • Published May 11 • 46