SketchVLM: Vision language models can annotate images to explain thoughts and guide users Paper • 2604.22875 • Published about 1 month ago • 35
World-R1: Reinforcing 3D Constraints for Text-to-Video Generation Paper • 2604.24764 • Published 27 days ago • 118
ReVSI: Rebuilding Visual Spatial Intelligence Evaluation for Accurate Assessment of VLM 3D Reasoning Paper • 2604.24300 • Published 27 days ago • 67
Exploring Spatial Intelligence from a Generative Perspective Paper • 2604.20570 • Published Apr 22 • 21
MultiWorld: Scalable Multi-Agent Multi-View Video World Models Paper • 2604.18564 • Published Apr 20 • 46
OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation Paper • 2604.18486 • Published Apr 20 • 94
Learning Adaptive Reasoning Paths for Efficient Visual Reasoning Paper • 2604.14568 • Published Apr 16 • 10
SpatialEvo: Self-Evolving Spatial Intelligence via Deterministic Geometric Environments Paper • 2604.14144 • Published Apr 15 • 63
You Only Judge Once: Multi-response Reward Modeling in a Single Forward Pass Paper • 2604.10966 • Published Apr 13 • 12
Attention Sink in Transformers: A Survey on Utilization, Interpretation, and Mitigation Paper • 2604.10098 • Published Apr 11 • 81
SkillClaw: Let Skills Evolve Collectively with Agentic Evolver Paper • 2604.08377 • Published Apr 9 • 291
VisionFoundry: Teaching VLMs Visual Perception with Synthetic Images Paper • 2604.09531 • Published Apr 10 • 8
OpenSpatial: A Principled Data Engine for Empowering Spatial Intelligence Paper • 2604.07296 • Published Apr 8 • 40
Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability Paper • 2604.06628 • Published Apr 8 • 325
HY-Embodied-0.5: Embodied Foundation Models for Real-World Agents Paper • 2604.07430 • Published Apr 8 • 187