AdaReasoner: Dynamic Tool Orchestration for Iterative Visual Reasoning Paper • 2601.18631 • Published 3 days ago • 45
Memory-V2V: Augmenting Video-to-Video Diffusion Models with Memory Paper • 2601.16296 • Published 7 days ago • 26
TwinBrainVLA: Unleashing the Potential of Generalist VLMs for Embodied Tasks via Asymmetric Mixture-of-Transformers Paper • 2601.14133 • Published 9 days ago • 57
VisGym: Diverse, Customizable, Scalable Environments for Multimodal Agents Paper • 2601.16973 • Published 6 days ago • 35
LLM-in-Sandbox Elicits General Agentic Intelligence Paper • 2601.16206 • Published 7 days ago • 80
Rethinking Video Generation Model for the Embodied World Paper • 2601.15282 • Published 8 days ago • 42
Toward Efficient Agents: Memory, Tool learning, and Planning Paper • 2601.14192 • Published 9 days ago • 51
More Images, More Problems? A Controlled Analysis of VLM Failure Modes Paper • 2601.07812 • Published 17 days ago • 6
Unlocking Implicit Experience: Synthesizing Tool-Use Trajectories from Text Paper • 2601.10355 • Published 14 days ago • 38
Molmo2: Open Weights and Data for Vision-Language Models with Video Understanding and Grounding Paper • 2601.10611 • Published 14 days ago • 26
Transition Matching Distillation for Fast Video Generation Paper • 2601.09881 • Published 15 days ago • 32
VLingNav: Embodied Navigation with Adaptive Reasoning and Visual-Assisted Linguistic Memory Paper • 2601.08665 • Published 16 days ago • 8
Fast-ThinkAct: Efficient Vision-Language-Action Reasoning via Verbalizable Latent Planning Paper • 2601.09708 • Published 15 days ago • 51