Qwen-Image-Agent: Bridging the Context Gap in Real-World Image Generation Paper • 2606.26907 • Published 4 days ago • 44
PerceptionDLM: Parallel Region Perception with Multimodal Diffusion Language Models Paper • 2606.19534 • Published 12 days ago • 64
Sumi: Open Uniform Diffusion Language Model from Scratch Paper • 2606.19005 • Published 12 days ago • 11
Unified Multimodal Autoregressive Modeling with Shared Context-Visual Tokenizer is Key to Unification Paper • 2606.18249 • Published 13 days ago • 14
PhoneHarness: Harnessing Phone-Use Agents through Mixed GUI, CLI, and Tool Actions Paper • 2606.14832 • Published 17 days ago • 12
Memento: Reconstruct to Remember for Consistent Long Video Generation Paper • 2606.14667 • Published 17 days ago • 17
VisualClaw: A Real-Time, Personalized Agent for the Physical World Paper • 2606.16295 • Published 14 days ago • 28
JoyAI-VL-Interaction: Real-Time Vision-Language Interaction Intelligence Paper • 2606.14777 • Published 19 days ago • 205