S-Agent: Spatial Tool-Use Elicits Reasoning for Spatial Intelligence Paper • 2606.20515 • Published 9 days ago • 39
Show the Signal, Hide the Noise: Spectral Forcing for Pixel-Space Diffusion Paper • 2606.15236 • Published 11 days ago • 21
On the Scaling of PEFT: Towards Million Personal Models of Trillion Parameters Paper • 2606.02437 • Published 26 days ago • 233
Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments Paper • 2605.30280 • Published 30 days ago • 146
Function2Scene: 3D Indoor Scene Layout from Functional Specifications Paper • 2605.30819 • Published 29 days ago • 42
SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture Paper • 2605.12500 • Published May 12 • 194
MolmoAct2: Action Reasoning Models for Real-world Deployment Paper • 2605.02881 • Published May 4 • 355
SpatialBench: Is Your Spatial Foundation Model an All-Round Player? Paper • 2605.27367 • Published May 26 • 72
PhysX-Omni: Unified Simulation-Ready Physical 3D Generation for Rigid, Deformable, and Articulated Objects Paper • 2605.21572 • Published May 20 • 55
VisionFoundry: Teaching VLMs Visual Perception with Synthetic Images Paper • 2604.09531 • Published Apr 10 • 10
Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding Paper • 2604.05015 • Published Apr 6 • 237
view article Article Welcome Gemma 4: Frontier multimodal intelligence on device +5 merve, pcuenq, sergiopaniego, burtenshaw, Steveeeeeeen, alvarobartt, SaylorTwift • Apr 2 • 909
FileGram: Grounding Agent Personalization in File-System Behavioral Traces Paper • 2604.04901 • Published Apr 6 • 40
PerceptionComp: A Video Benchmark for Complex Perception-Centric Reasoning Paper • 2603.26653 • Published Mar 27 • 18
HippoCamp: Benchmarking Contextual Agents on Personal Computers Paper • 2604.01221 • Published Apr 1 • 30
Insight-V++: Towards Advanced Long-Chain Visual Reasoning with Multimodal Large Language Models Paper • 2603.18118 • Published Mar 18 • 12
LoGeR: Long-Context Geometric Reconstruction with Hybrid Memory Paper • 2603.03269 • Published Mar 3 • 63
Holi-Spatial: Evolving Video Streams into Holistic 3D Spatial Intelligence Paper • 2603.07660 • Published Mar 8 • 87