Being-H0.5: Scaling Human-Centric Robot Learning for Cross-Embodiment Generalization Paper • 2601.12993 • Published Jan 19 • 75
Spatial-Aware VLA Pretraining through Visual-Physical Alignment from Human Videos Paper • 2512.13080 • Published Dec 15, 2025 • 17
TimeViper: A Hybrid Mamba-Transformer Vision-Language Model for Efficient Long Video Understanding Paper • 2511.16595 • Published Nov 20, 2025 • 10
view article Article π0 and π0-FAST: Vision-Language-Action Models for General Robot Control +2 Feb 4, 2025 • 191
DeepPHY: Benchmarking Agentic VLMs on Physical Reasoning Paper • 2508.05405 • Published Aug 7, 2025 • 64
Being-H0: Vision-Language-Action Pretraining from Large-Scale Human Videos Paper • 2507.15597 • Published Jul 21, 2025 • 34
Being-0: A Humanoid Robotic Agent with Vision-Language Models and Modular Skills Paper • 2503.12533 • Published Mar 16, 2025 • 68
MLLM as Retriever: Interactively Learning Multimodal Retrieval for Embodied Agents Paper • 2410.03450 • Published Oct 4, 2024 • 36