SAMA: Factorized Semantic Anchoring and Motion Alignment for Instruction-Guided Video Editing Paper • 2603.19228 • Published 5 days ago • 64
Beyond Single Tokens: Distilling Discrete Diffusion Models via Discrete MMD Paper • 2603.20155 • Published 4 days ago • 7
FlowScene: Style-Consistent Indoor Scene Generation with Multimodal Graph Rectified Flow Paper • 2603.19598 • Published 4 days ago • 30
VTC-Bench: Evaluating Agentic Multimodal Models via Compositional Visual Tool Chaining Paper • 2603.15030 • Published 8 days ago • 19
Cubic Discrete Diffusion: Discrete Visual Generation on High-Dimensional Representation Tokens Paper • 2603.19232 • Published 5 days ago • 31
Unified Spatio-Temporal Token Scoring for Efficient Video VLMs Paper • 2603.18004 • Published 6 days ago • 12
Bridging Semantic and Kinematic Conditions with Diffusion-based Discrete Motion Tokenizer Paper • 2603.19227 • Published 5 days ago • 40
Stereo World Model: Camera-Guided Stereo Video Generation Paper • 2603.17375 • Published 6 days ago • 11
Anatomy of a Lie: A Multi-Stage Diagnostic Framework for Tracing Hallucinations in Vision-Language Models Paper • 2603.15557 • Published 8 days ago • 28
Rethinking UMM Visual Generation: Masked Modeling for Efficient Image-Only Pre-training Paper • 2603.16139 • Published 7 days ago • 31
MMOU: A Massive Multi-Task Omni Understanding and Reasoning Benchmark for Long and Complex Real-World Videos Paper • 2603.14145 • Published 10 days ago • 13
Thinking in Uncertainty: Mitigating Hallucinations in MLRMs with Latent Entropy-Aware Decoding Paper • 2603.13366 • Published 15 days ago • 93
AgentProcessBench: Diagnosing Step-Level Process Quality in Tool-Using Agents Paper • 2603.14465 • Published 9 days ago • 22