SWE-CI: Evaluating Agent Capabilities in Maintaining Codebases via Continuous Integration Paper • 2603.03823 • Published Mar 4 • 7
DeepVision-103K: A Visually Diverse, Broad-Coverage, and Verifiable Mathematical Dataset for Multimodal Reasoning Paper • 2602.16742 • Published Feb 18 • 12