FlowAnchor: Stabilizing the Editing Signal for Inversion-Free Video Editing Paper • 2604.22586 • Published 12 days ago • 16
Contexts are Never Long Enough: Structured Reasoning for Scalable Question Answering over Long Document Sets Paper • 2604.22294 • Published 12 days ago • 16
LLM Safety From Within: Detecting Harmful Content with Internal Representations Paper • 2604.18519 • Published 16 days ago • 24
DiffNR: Diffusion-Enhanced Neural Representation Optimization for Sparse-View 3D Tomographic Reconstruction Paper • 2604.21518 • Published 13 days ago • 28
Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond Paper • 2604.22748 • Published 12 days ago • 223
VP-VLA: Visual Prompting as an Interface for Vision-Language-Action Models Paper • 2603.22003 • Published Mar 23 • 12
Attend Before Attention: Efficient and Scalable Video Understanding via Autoregressive Gazing Paper • 2603.12254 • Published Mar 12 • 22
Rethinking Token-Level Policy Optimization for Multimodal Chain-of-Thought Paper • 2603.22847 • Published Mar 24 • 26
RealMaster: Lifting Rendered Scenes into Photorealistic Video Paper • 2603.23462 • Published Mar 24 • 33
UniGRPO: Unified Policy Optimization for Reasoning-Driven Visual Generation Paper • 2603.23500 • Published Mar 24 • 35
SpecEyes: Accelerating Agentic Multimodal LLMs via Speculative Perception and Planning Paper • 2603.23483 • Published Mar 24 • 62
DA-Flow: Degradation-Aware Optical Flow Estimation with Diffusion Models Paper • 2603.23499 • Published Mar 24 • 51
From Static Templates to Dynamic Runtime Graphs: A Survey of Workflow Optimization for LLM Agents Paper • 2603.22386 • Published Mar 23 • 57
SIMART: Decomposing Monolithic Meshes into Sim-ready Articulated Assets via MLLM Paper • 2603.23386 • Published Mar 24 • 40
MinerU-Diffusion: Rethinking Document OCR as Inverse Rendering via Diffusion Decoding Paper • 2603.22458 • Published Mar 23 • 135
WildWorld: A Large-Scale Dataset for Dynamic World Modeling with Actions and Explicit State toward Generative ARPG Paper • 2603.23497 • Published Mar 24 • 91
Cognitive Mismatch in Multimodal Large Language Models for Discrete Symbol Understanding Paper • 2603.18472 • Published Mar 19 • 20