AnyRecon: Arbitrary-View 3D Reconstruction with Video Diffusion Model Paper • 2604.19747 • Published 17 days ago • 39
HiVLA: A Visual-Grounded-Centric Hierarchical Embodied Manipulation System Paper • 2604.14125 • Published 23 days ago • 21
view article Article Building Autonomous Vehicles That Reason with the NVIDIA Alpamayo Open Ecosystem Jan 5 • 26
From Passive Observer to Active Critic: Reinforcement Learning Elicits Process Reasoning for Robotic Manipulation Paper • 2603.15600 • Published Mar 16 • 7
UltraDexGrasp: Learning Universal Dexterous Grasping for Bimanual Robots with Synthetic Data Paper • 2603.05312 • Published Mar 5 • 7
BiManiBench: A Hierarchical Benchmark for Evaluating Bimanual Coordination of Multimodal Large Language Models Paper • 2602.08392 • Published Feb 9 • 3
VER: Vision Expert Transformer for Robot Learning via Foundation Distillation and Dynamic Routing Paper • 2510.05213 • Published Oct 6, 2025 • 6
EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs Paper • 2509.09174 • Published Sep 11, 2025 • 62
Discrete Diffusion VLA: Bringing Discrete Diffusion to Action Decoding in Vision-Language-Action Policies Paper • 2508.20072 • Published Aug 27, 2025 • 32
HyCodePolicy: Hybrid Language Controllers for Multimodal Monitoring and Decision in Embodied Agents Paper • 2508.02629 • Published Aug 4, 2025 • 6
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text Paper • 2406.08418 • Published Jun 12, 2024 • 33
Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots Paper • 2405.07990 • Published May 13, 2024 • 20