Learning Humanoid End-Effector Control for Open-Vocabulary Visual Loco-Manipulation Paper • 2602.16705 • Published 2 days ago • 26
Recurrent-Depth VLA: Implicit Test-Time Compute Scaling of Vision-Language-Action Models via Latent Iterative Reasoning Paper • 2602.07845 • Published 12 days ago • 68
Vision-DeepResearch Benchmark: Rethinking Visual and Textual Search for Multimodal Large Language Models Paper • 2602.02185 • Published 18 days ago • 125
Green-VLA: Staged Vision-Language-Action Model for Generalist Robots Paper • 2602.00919 • Published 20 days ago • 284