EvoCUA: Evolving Computer Use Agents via Learning from Scalable Synthetic Experience Paper • 2601.15876 • Published 4 days ago • 84
SWE-Pruner: Self-Adaptive Context Pruning for Coding Agents Paper • 2601.16746 • Published 3 days ago • 62
Endless Terminals: Scaling RL Environments for Terminal Agents Paper • 2601.16443 • Published 3 days ago • 3
DSGym: A Holistic Framework for Evaluating and Training Data Science Agents Paper • 2601.16344 • Published 4 days ago • 2
MMDeepResearch-Bench: A Benchmark for Multimodal Deep Research Agents Paper • 2601.12346 • Published 8 days ago • 47
PROGRESSLM: Towards Progress Reasoning in Vision-Language Models Paper • 2601.15224 • Published 5 days ago • 12
Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces Paper • 2601.11868 • Published 10 days ago • 28
Multiplex Thinking: Reasoning via Token-wise Branch-and-Merge Paper • 2601.08808 • Published 13 days ago • 38
The Assistant Axis: Situating and Stabilizing the Default Persona of Language Models Paper • 2601.10387 • Published 11 days ago • 10
PACEvolve: Enabling Long-Horizon Progress-Aware Consistent Evolution Paper • 2601.10657 • Published 11 days ago • 20
Toward Ultra-Long-Horizon Agentic Science: Cognitive Accumulation for Machine Learning Engineering Paper • 2601.10402 • Published 11 days ago • 36
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization Paper • 2601.05242 • Published 18 days ago • 208
SciEvalKit: An Open-source Evaluation Toolkit for Scientific General Intelligence Paper • 2512.22334 • Published about 1 month ago • 35
Agentic Rubrics as Contextual Verifiers for SWE Agents Paper • 2601.04171 • Published 19 days ago • 11
NitroGen: An Open Foundation Model for Generalist Gaming Agents Paper • 2601.02427 • Published 22 days ago • 42