Which Pretraining Paradigm Better Serves Spatial Intelligence? An Empirical Comparison of Vision-Language and Video Generation Models Paper • 2605.28132 • Published 9 days ago • 25
Balancing Fidelity and Diversity in Diffusion Models via Symmetric Attention Decomposition: Hopfield Perspective Paper • 2605.27476 • Published 10 days ago • 20
Model-Adaptive Tool Necessity Reveals the Knowing-Doing Gap in LLM Tool Use Paper • 2605.14038 • Published 23 days ago • 15
IndusAgent: Reinforcing Open-Vocabulary Industrial Anomaly Detection with Agentic Tools Paper • 2605.20682 • Published 16 days ago • 83
CiteVQA: Benchmarking Evidence Attribution for Trustworthy Document Intelligence Paper • 2605.12882 • Published 23 days ago • 270
SplAttN: Bridging 2D and 3D with Gaussian Soft Splatting and Attention for Point Cloud Completion Paper • 2605.01466 • Published May 2 • 6
Adam's Law: Textual Frequency Law on Large Language Models Paper • 2604.02176 • Published Apr 2 • 504
When Numbers Speak: Aligning Textual Numerals and Visual Instances in Text-to-Video Diffusion Models Paper • 2604.08546 • Published Apr 9 • 115
CARLA-Air: Fly Drones Inside a CARLA World -- A Unified Infrastructure for Air-Ground Embodied Intelligence Paper • 2603.28032 • Published Mar 30 • 343
FIPO: Eliciting Deep Reasoning with Future-KL Influenced Policy Optimization Paper • 2603.19835 • Published Mar 20 • 352
ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling Paper • 2603.25746 • Published Mar 26 • 155
SAMA: Factorized Semantic Anchoring and Motion Alignment for Instruction-Guided Video Editing Paper • 2603.19228 • Published Mar 19 • 68
Bootstrapping Exploration with Group-Level Natural Language Feedback in Reinforcement Learning Paper • 2603.04597 • Published Mar 4 • 211
From Blind Spots to Gains: Diagnostic-Driven Iterative Training for Large Multimodal Models Paper • 2602.22859 • Published Feb 26 • 150
Does Your Reasoning Model Implicitly Know When to Stop Thinking? Paper • 2602.08354 • Published Feb 9 • 266