Cheers: Decoupling Patch Details from Semantic Representations Enables Unified Multimodal Comprehension and Generation Paper • 2603.12793 • Published 18 days ago • 38
HybridStitch: Pixel and Timestep Level Model Stitching for Diffusion Acceleration Paper • 2603.07815 • Published 22 days ago • 10
Can Vision-Language Models Solve the Shell Game? Paper • 2603.08436 • Published 22 days ago • 39
From Sparse to Dense: Multi-View GRPO for Flow Models via Augmented Condition Space Paper • 2603.12648 • Published 18 days ago • 14
Video Streaming Thinking: VideoLLMs Can Watch and Think Simultaneously Paper • 2603.12262 • Published 18 days ago • 30
MM-CondChain: A Programmatically Verified Benchmark for Visually Grounded Deep Compositional Reasoning Paper • 2603.12266 • Published 18 days ago • 19
V-Bridge: Bridging Video Generative Priors to Versatile Few-shot Image Restoration Paper • 2603.13089 • Published 17 days ago • 13
Visual-ERM: Reward Modeling for Visual Equivalence Paper • 2603.13224 • Published 17 days ago • 21
OmniForcing: Unleashing Real-time Joint Audio-Visual Generation Paper • 2603.11647 • Published 19 days ago • 31
BeyondSWE: Can Current Code Agent Survive Beyond Single-Repo Bug Fixing? Paper • 2603.03194 • Published 27 days ago • 56
SkyReels-V4: Multi-modal Video-Audio Generation, Inpainting and Editing model Paper • 2602.21818 • Published Feb 25 • 56
Immersion in the GitHub Universe: Scaling Coding Agents to Mastery Paper • 2602.09892 • Published Feb 10 • 5
IterResearch: Rethinking Long-Horizon Agents via Markovian State Reconstruction Paper • 2511.07327 • Published Nov 10, 2025 • 80
ReForm: Reflective Autoformalization with Prospective Bounded Sequence Optimization Paper • 2510.24592 • Published Oct 28, 2025 • 17