Embarrassingly Simple Self-Distillation Improves Code Generation Paper • 2604.01193 • Published 15 days ago • 40
HippoCamp: Benchmarking Contextual Agents on Personal Computers Paper • 2604.01221 • Published 15 days ago • 29
Vision2Web: A Hierarchical Benchmark for Visual Website Development with Agent Verification Paper • 2603.26648 • Published 20 days ago • 42
AIBench: Evaluating Visual-Logical Consistency in Academic Illustration Generation Paper • 2603.28068 • Published 17 days ago • 13