DSGym: A Holistic Framework for Evaluating and Training Data Science Agents Paper • 2601.16344 • Published 4 days ago • 2
Endless Terminals: Scaling RL Environments for Terminal Agents Paper • 2601.16443 • Published 3 days ago • 3
SWE-Pruner: Self-Adaptive Context Pruning for Coding Agents Paper • 2601.16746 • Published 3 days ago • 62
Memory-V2V: Augmenting Video-to-Video Diffusion Models with Memory Paper • 2601.16296 • Published 4 days ago • 13
Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control and Planning Paper • 2601.16163 • Published 4 days ago • 13
Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces Paper • 2601.11868 • Published 10 days ago • 28
Rethinking Video Generation Model for the Embodied World Paper • 2601.15282 • Published 5 days ago • 42
Facilitating Proactive and Reactive Guidance for Decision Making on the Web: A Design Probe with WebSeek Paper • 2601.15100 • Published 5 days ago • 3
Render-of-Thought: Rendering Textual Chain-of-Thought as Images for Visual Latent Reasoning Paper • 2601.14750 • Published 5 days ago • 16
The Assistant Axis: Situating and Stabilizing the Default Persona of Language Models Paper • 2601.10387 • Published 11 days ago • 10
FrankenMotion: Part-level Human Motion Generation and Composition Paper • 2601.10909 • Published 11 days ago • 18
AstroReason-Bench: Evaluating Unified Agentic Planning across Heterogeneous Space Planning Problems Paper • 2601.11354 • Published 10 days ago • 4
BAPO: Boundary-Aware Policy Optimization for Reliable Agentic Search Paper • 2601.11037 • Published 10 days ago • 17