view article Article π0 and π0-FAST: Vision-Language-Action Models for General Robot Control +2 Feb 4 • 186
Are Video Models Ready as Zero-Shot Reasoners? An Empirical Study with the MME-CoF Benchmark Paper • 2510.26802 • Published Oct 30 • 33
Verbalized Sampling: How to Mitigate Mode Collapse and Unlock LLM Diversity Paper • 2510.01171 • Published Oct 1 • 18
RLinf-VLA: A Unified and Efficient Framework for VLA+RL Training Paper • 2510.06710 • Published Oct 8 • 39
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey Paper • 2509.02547 • Published Sep 2 • 227
ThinkGrasp: A Vision-Language System for Strategic Part Grasping in Clutter Paper • 2407.11298 • Published Jul 16, 2024 • 6
D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI Paper • 2510.05684 • Published Oct 7 • 141
BEAR: Benchmarking and Enhancing Multimodal Language Models for Atomic Embodied Capabilities Paper • 2510.08759 • Published Oct 9 • 46
VIKI-R: Coordinating Embodied Multi-Agent Cooperation via Reinforcement Learning Paper • 2506.09049 • Published Jun 10 • 37