CronusVLA: Transferring Latent Motion Across Time for Multi-Frame Prediction in Manipulation Paper • 2506.19816 • Published Jun 24, 2025
GENMANIP: LLM-driven Simulation for Generalizable Instruction-Following Manipulation Paper • 2506.10966 • Published Jun 12, 2025
InternVLA-M1: A Spatially Guided Vision-Language-Action Framework for Generalist Robot Policy Paper • 2510.13778 • Published Oct 15, 2025 • 17
InstructVLA: Vision-Language-Action Instruction Tuning from Understanding to Manipulation Paper • 2507.17520 • Published Jul 23, 2025 • 15
MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical Grounded Language Annotations Paper • 2406.09401 • Published Jun 13, 2024
Unified Generative and Discriminative Training for Multi-modal Large Language Models Paper • 2411.00304 • Published Nov 1, 2024