SketchVLM: Vision language models can annotate images to explain thoughts and guide users Paper • 2604.22875 • Published 18 days ago • 34
World-R1: Reinforcing 3D Constraints for Text-to-Video Generation Paper • 2604.24764 • Published 14 days ago • 116
ReVSI: Rebuilding Visual Spatial Intelligence Evaluation for Accurate Assessment of VLM 3D Reasoning Paper • 2604.24300 • Published 14 days ago • 65
Exploring Spatial Intelligence from a Generative Perspective Paper • 2604.20570 • Published 19 days ago • 21
(1D) Ordered Tokens Enable Efficient Test-Time Search Paper • 2604.15453 • Published 25 days ago • 18
MultiWorld: Scalable Multi-Agent Multi-View Video World Models Paper • 2604.18564 • Published 21 days ago • 45
OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation Paper • 2604.18486 • Published 21 days ago • 90
Learning Adaptive Reasoning Paths for Efficient Visual Reasoning Paper • 2604.14568 • Published 25 days ago • 8
SpatialEvo: Self-Evolving Spatial Intelligence via Deterministic Geometric Environments Paper • 2604.14144 • Published 26 days ago • 63
You Only Judge Once: Multi-response Reward Modeling in a Single Forward Pass Paper • 2604.10966 • Published 28 days ago • 11
Attention Sink in Transformers: A Survey on Utilization, Interpretation, and Mitigation Paper • 2604.10098 • Published about 1 month ago • 80
SkillClaw: Let Skills Evolve Collectively with Agentic Evolver Paper • 2604.08377 • Published Apr 9 • 289
VisionFoundry: Teaching VLMs Visual Perception with Synthetic Images Paper • 2604.09531 • Published Apr 10 • 8