PerceptionDLM: Parallel Region Perception with Multimodal Diffusion Language Models Paper • 2606.19534 • Published 12 days ago • 63
WBench: A Comprehensive Multi-turn Benchmark for Interactive Video World Model Evaluation Paper • 2605.25874 • Published May 25 • 103
EffectErase: Joint Video Object Removal and Insertion for High-Quality Effect Erasing Paper • 2603.19224 • Published Mar 19 • 18
Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs Paper • 2510.18876 • Published Oct 21, 2025 • 37