AffordanceVLA: A Vision-Language-Action Model Empowering Action Generation through Affordance-Aware Understanding Paper • 2606.06155 • Published 27 days ago • 10
CapVector: Learning Transferable Capability Vectors in Parametric Space for Vision-Language-Action Models Paper • 2605.10903 • Published May 11
RoboMemArena: A Comprehensive and Challenging Robotic Memory Benchmark Paper • 2605.10921 • Published May 11 • 4
Kinema4D: Kinematic 4D World Modeling for Spatiotemporal Embodied Simulation Paper • 2603.16669 • Published Mar 17 • 70
Watching, Reasoning, and Searching: A Video Deep Research Benchmark on Open Web for Agentic Video Reasoning Paper • 2601.06943 • Published Jan 11 • 215
HiF-VLA: Hindsight, Insight and Foresight through Motion Representation for Vision-Language-Action Models Paper • 2512.09928 • Published Dec 10, 2025 • 14
UltraFlux: Data-Model Co-Design for High-quality Native 4K Text-to-Image Generation across Diverse Aspect Ratios Paper • 2511.18050 • Published Nov 22, 2025 • 38
Unified Diffusion VLA: Vision-Language-Action Model via Joint Discrete Denoising Diffusion Process Paper • 2511.01718 • Published Nov 3, 2025 • 7 • 1
VLA^2: Empowering Vision-Language-Action Models with an Agentic Framework for Unseen Concept Manipulation Paper • 2510.14902 • Published Oct 16, 2025 • 17
Spatial Forcing: Implicit Spatial Representation Alignment for Vision-language-action Model Paper • 2510.12276 • Published Oct 14, 2025 • 149 • 4
Spatial Forcing: Implicit Spatial Representation Alignment for Vision-language-action Model Paper • 2510.12276 • Published Oct 14, 2025 • 149
Spatial Forcing: Implicit Spatial Representation Alignment for Vision-language-action Model Paper • 2510.12276 • Published Oct 14, 2025 • 149 • 4
GeRM: A Generalist Robotic Model with Mixture-of-experts for Quadruped Robot Paper • 2403.13358 • Published Mar 20, 2024 • 3