Transition Matching Distillation for Fast Video Generation Paper • 2601.09881 • Published 12 days ago • 31
Thinking with Map: Reinforced Parallel Map-Augmented Agent for Geolocalization Paper • 2601.05432 • Published 18 days ago • 161
VideoAR: Autoregressive Video Generation via Next-Frame & Scale Prediction Paper • 2601.05966 • Published 17 days ago • 23
InfiniDepth: Arbitrary-Resolution and Fine-Grained Depth Estimation with Neural Implicit Fields Paper • 2601.03252 • Published 20 days ago • 99
VINO: A Unified Visual Generator with Interleaved OmniModal Context Paper • 2601.02358 • Published 21 days ago • 29
SpatialTree: How Spatial Abilities Branch Out in MLLMs Paper • 2512.20617 • Published Dec 23, 2025 • 43
Qwen-Image-Layered: Towards Inherent Editability via Layer Decomposition Paper • 2512.15603 • Published Dec 17, 2025 • 63
EgoX: Egocentric Video Generation from a Single Exocentric Video Paper • 2512.08269 • Published Dec 9, 2025 • 119
EditThinker: Unlocking Iterative Reasoning for Any Image Editor Paper • 2512.05965 • Published Dec 5, 2025 • 38
PaperDebugger: A Plugin-Based Multi-Agent System for In-Editor Academic Writing, Review, and Editing Paper • 2512.02589 • Published Dec 2, 2025 • 71
Thinking with Programming Vision: Towards a Unified View for Thinking with Images Paper • 2512.03746 • Published Dec 3, 2025 • 17
OneThinker: All-in-one Reasoning Model for Image and Video Paper • 2512.03043 • Published Dec 2, 2025 • 33
DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models Paper • 2512.02556 • Published Dec 2, 2025 • 253
OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe Paper • 2511.16334 • Published Nov 20, 2025 • 93