Turning the TIDE: Cross-Architecture Distillation for Diffusion Large Language Models Paper โข 2604.26951 โข Published 11 days ago โข 46
MultiWorld: Scalable Multi-Agent Multi-View Video World Models Paper โข 2604.18564 โข Published 20 days ago โข 45
Manifold-Aware Exploration for Reinforcement Learning in Video Generation Paper โข 2603.21872 โข Published Mar 23 โข 33
OpenSeeker: Democratizing Frontier Search Agents by Fully Open-Sourcing Training Data Paper โข 2603.15594 โข Published Mar 16 โข 149
LTX-2: Efficient Joint Audio-Visual Foundation Model Paper โข 2601.03233 โข Published Jan 6 โข 178
LongVideoAgent: Multi-Agent Reasoning with Long Videos Paper โข 2512.20618 โข Published Dec 23, 2025 โข 56
The World is Your Canvas: Painting Promptable Events with Reference Images, Trajectories, and Text Paper โข 2512.16924 โข Published Dec 18, 2025 โข 27
DocReward: A Document Reward Model for Structuring and Stylizing Paper โข 2510.11391 โข Published Oct 13, 2025 โข 27
ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data Paper โข 2509.15221 โข Published Sep 18, 2025 โข 111
Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains RLVR Paper โข 2508.14029 โข Published Aug 19, 2025 โข 119
OmniPart: Part-Aware 3D Generation with Semantic Decoupling and Structural Cohesion Paper โข 2507.06165 โข Published Jul 8, 2025 โข 60
Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling Paper โข 2507.07982 โข Published Jul 10, 2025 โข 34
Calligrapher: Freestyle Text Image Customization Paper โข 2506.24123 โข Published Jun 30, 2025 โข 37
ImgEdit: A Unified Image Editing Dataset and Benchmark Paper โข 2505.20275 โข Published May 26, 2025 โข 20
MineWorld: a Real-Time and Open-Source Interactive World Model on Minecraft Paper โข 2504.08388 โข Published Apr 11, 2025 โข 42
An Empirical Study of GPT-4o Image Generation Capabilities Paper โข 2504.05979 โข Published Apr 8, 2025 โข 64