Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments Paper • 2605.30280 • Published May 28 • 146
VLM4VLA: Revisiting Vision-Language-Models in Vision-Language-Action Models Paper • 2601.03309 • Published Jan 6 • 2
M3DBench: Let's Instruct Large Models with Multi-modal 3D Prompts Paper • 2312.10763 • Published Dec 17, 2023 • 19