Embarrassingly Simple Self-Distillation Improves Code Generation Paper • 2604.01193 • Published 8 days ago • 33
HippoCamp: Benchmarking Contextual Agents on Personal Computers Paper • 2604.01221 • Published 8 days ago • 27
Vision2Web: A Hierarchical Benchmark for Visual Website Development with Agent Verification Paper • 2603.26648 • Published 13 days ago • 42
AIBench: Evaluating Visual-Logical Consistency in Academic Illustration Generation Paper • 2603.28068 • Published 10 days ago • 13