EgoActor: Grounding Task Planning into Spatial-aware Egocentric Actions for Humanoid Robots via Visual-Language Models Paper • 2602.04515 • Published 2 days ago • 33
The Era of Agentic Organization: Learning to Organize with Language Models Paper • 2510.26658 • Published Oct 30, 2025 • 28
DeepPHY: Benchmarking Agentic VLMs on Physical Reasoning Paper • 2508.05405 • Published Aug 7, 2025 • 64
Being-0: A Humanoid Robotic Agent with Vision-Language Models and Modular Skills Paper • 2503.12533 • Published Mar 16, 2025 • 68
Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia Paper • 2503.07920 • Published Mar 10, 2025 • 101
Taking Notes Brings Focus? Towards Multi-Turn Multimodal Dialogue Learning Paper • 2503.07002 • Published Mar 10, 2025 • 39
MLLM as Retriever: Interactively Learning Multimodal Retrieval for Embodied Agents Paper • 2410.03450 • Published Oct 4, 2024 • 36