InterleaveThinker: Reinforcing Agentic Interleaved Generation Paper • 2606.13679 • Published 15 days ago • 80
Skill-3D: Evolving Scene-Aware Skills for Agentic 3D Spatial Reasoning Paper • 2606.07436 • Published 21 days ago • 24
Uni-Edit: Intelligent Editing Is A General Task For Unified Model Tuning Paper • 2605.21487 • Published May 20 • 23
From Web to Pixels: Bringing Agentic Search into Visual Perception Paper • 2605.12497 • Published May 12 • 14
OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents Paper • 2605.05185 • Published May 6 • 106
The Latent Space: Foundation, Evolution, Mechanism, Ability, and Outlook Paper • 2604.02029 • Published Apr 2 • 152
Unify-Agent: A Unified Multimodal Agent for World-Grounded Image Synthesis Paper • 2603.29620 • Published Mar 31 • 49
ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling Paper • 2603.25746 • Published Mar 26 • 155
Gen-Searcher: Reinforcing Agentic Search for Image Generation Paper • 2603.28767 • Published Mar 30 • 58
MACRO: Advancing Multi-Reference Image Generation with Structured Long-Context Data Paper • 2603.25319 • Published Mar 26 • 32
RISE: Self-Improving Robot Policy with Compositional World Model Paper • 2602.11075 • Published Feb 11 • 29
Vision-DeepResearch Benchmark: Rethinking Visual and Textual Search for Multimodal Large Language Models Paper • 2602.02185 • Published Feb 2 • 118
Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language Models Paper • 2601.22060 • Published Jan 29 • 155
AdaTooler-V: Adaptive Tool-Use for Images and Videos Paper • 2512.16918 • Published Dec 18, 2025 • 14
JarvisEvo: Towards a Self-Evolving Photo Editing Agent with Synergistic Editor-Evaluator Optimization Paper • 2511.23002 • Published Nov 28, 2025 • 26
From Macro to Micro: Benchmarking Microscopic Spatial Intelligence on Molecules via Vision-Language Models Paper • 2512.10867 • Published Dec 11, 2025 • 16