InCoder-32B: Code Foundation Model for Industrial Scenarios Paper • 2603.16790 • Published 16 days ago • 306
PIRA-Bench: A Transition from Reactive GUI Agents to GUI-based Proactive Intent Recommendation Agents Paper • 2603.08013 • Published 24 days ago • 14
AgentVista: Evaluating Multimodal Agents in Ultra-Challenging Realistic Visual Scenarios Paper • 2602.23166 • Published Feb 26 • 44
Gaia2: Benchmarking LLM Agents on Dynamic and Asynchronous Environments Paper • 2602.11964 • Published Feb 12 • 12
DataChef: Cooking Up Optimal Data Recipes for LLM Adaptation via Reinforcement Learning Paper • 2602.11089 • Published Feb 11 • 18
MathReal: We Keep It Real! A Real Scene Benchmark for Evaluating Math Reasoning in Multimodal Large Language Models Paper • 2508.06009 • Published Aug 8, 2025 • 16