You Don't Need Strong Assumptions: Visual Representation Learning via Temporal Differences Paper • 2606.15956 • Published 13 days ago • 12
FlashMemory-DeepSeek-V4: Lightning Index Ultra-Long Context via Lookahead Sparse Attention Paper • 2606.09079 • Published 19 days ago • 64
Which Pretraining Paradigm Better Serves Spatial Intelligence? An Empirical Comparison of Vision-Language and Video Generation Models Paper • 2605.28132 • Published May 27 • 25
ESPO: Early-Stopping Proximal Policy Optimization Paper • 2605.29860 • Published about 1 month ago • 20
VideoMLA: Low-Rank Latent KV Cache for Minute-Scale Autoregressive Video Diffusion Paper • 2605.30351 • Published about 1 month ago • 26
From Pixels to Words -- Towards Native One-Vision Models at Scale Paper • 2605.28820 • Published May 27 • 75
Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments Paper • 2605.30280 • Published about 1 month ago • 146
LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding Paper • 2605.27365 • Published May 26 • 144
MUSE-Autoskill: Self-Evolving Agents via Skill Creation, Memory, Management, and Evaluation Paper • 2605.27366 • Published May 26 • 29
Full Attention Strikes Back: Transferring Full Attention into Sparse within Hundred Training Steps Paper • 2605.16928 • Published May 16 • 97
When Less is Enough: Adaptive Token Reduction for Efficient Image Representation Paper • 2503.16660 • Published Mar 20, 2025 • 73